Testing the knowledge of artificial intelligence chatbots in pharmacology: examples of two groups of drugs

The study aimed to evaluate eight artificial intelligence chatbots (ChatGPT-3.5, Microsoft Copilot, Gemini, You.com, Perplexity, Character.ai, Claude 3.5, and ChatRTX) in answering questions related to two pharmacological topics taught during the basic pharmacology curriculum for medical students: a...

Full description

Saved in:
Bibliographic Details
Published inPeerJ. Computer science Vol. 11; p. e2954
Main Authors Granat, Marcin Mateusz, Paź, Aleksandra, Mirowska-Guzel, Dagmara
Format Journal Article
LanguageEnglish
Published PeerJ. Ltd 15.07.2025
PeerJ Inc
Subjects
Online AccessGet full text
ISSN2376-5992
2376-5992
DOI10.7717/peerj-cs.2954

Cover

More Information
Summary:The study aimed to evaluate eight artificial intelligence chatbots (ChatGPT-3.5, Microsoft Copilot, Gemini, You.com, Perplexity, Character.ai, Claude 3.5, and ChatRTX) in answering questions related to two pharmacological topics taught during the basic pharmacology curriculum for medical students: antifungal drugs and hypolipidemic drugs. Chatbots' performance was assessed by answering 60 single-choice questions on antifungal and hypolipidemic drugs topics. The questions were designed to have four answers (a, b, c, and d), and the artificial intelligence (AI) role was to choose the proper one. The assessment was performed twice with a 1-year hiatus to determine if artificial intelligence chatbots' effectiveness changed over time. All the answers were checked for being right or wrong according to up-to-date pharmacology knowledge. To improve the clarity of results, to each score, a mark was assigned based on the grading system applied in our unit. Statistica software version 13.3 and Microsoft Excel 2010 were used for statistical analysis. In 2023, the best results on the subject of antifungal drugs were obtained by Gemini (formerly Bard) and on the topic of hypolipidemic drugs by You.com (formerly YouChat). In 2024Microsoft Copilot answered correctly the highest number of questions in both topics. The total results of all artificial intelligence chatbots in 2023 and 2024 were compared using t-test for dependent samples. Statistical analysis revealed that artificial intelligence chatbots improved over time in both pharmacological topics, but this change was not statistically significant (p = 0.784 for antifungal drugs subject and p = 0.056 for hypolipidemic drugs). The accuracy of AI chatbots' responses regarding antifungal and hypolipidemic drugs improved over one year, though not significantly. None of the tested AI systems provided correct answers to all questions within these pharmacological fields.
ISSN:2376-5992
2376-5992
DOI:10.7717/peerj-cs.2954