Testing the knowledge of artificial intelligence chatbots in pharmacology: examples of two groups of drugs
The study aimed to evaluate eight artificial intelligence chatbots (ChatGPT-3.5, Microsoft Copilot, Gemini, You.com, Perplexity, Character.ai, Claude 3.5, and ChatRTX) in answering questions related to two pharmacological topics taught during the basic pharmacology curriculum for medical students: a...
Saved in:
Published in | PeerJ. Computer science Vol. 11; p. e2954 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
PeerJ. Ltd
15.07.2025
PeerJ Inc |
Subjects | |
Online Access | Get full text |
ISSN | 2376-5992 2376-5992 |
DOI | 10.7717/peerj-cs.2954 |
Cover
Summary: | The study aimed to evaluate eight artificial intelligence chatbots (ChatGPT-3.5, Microsoft Copilot, Gemini, You.com, Perplexity, Character.ai, Claude 3.5, and ChatRTX) in answering questions related to two pharmacological topics taught during the basic pharmacology curriculum for medical students: antifungal drugs and hypolipidemic drugs. Chatbots' performance was assessed by answering 60 single-choice questions on antifungal and hypolipidemic drugs topics. The questions were designed to have four answers (a, b, c, and d), and the artificial intelligence (AI) role was to choose the proper one. The assessment was performed twice with a 1-year hiatus to determine if artificial intelligence chatbots' effectiveness changed over time. All the answers were checked for being right or wrong according to up-to-date pharmacology knowledge. To improve the clarity of results, to each score, a mark was assigned based on the grading system applied in our unit. Statistica software version 13.3 and Microsoft Excel 2010 were used for statistical analysis. In 2023, the best results on the subject of antifungal drugs were obtained by Gemini (formerly Bard) and on the topic of hypolipidemic drugs by You.com (formerly YouChat). In 2024Microsoft Copilot answered correctly the highest number of questions in both topics. The total results of all artificial intelligence chatbots in 2023 and 2024 were compared using t-test for dependent samples. Statistical analysis revealed that artificial intelligence chatbots improved over time in both pharmacological topics, but this change was not statistically significant (p = 0.784 for antifungal drugs subject and p = 0.056 for hypolipidemic drugs). The accuracy of AI chatbots' responses regarding antifungal and hypolipidemic drugs improved over one year, though not significantly. None of the tested AI systems provided correct answers to all questions within these pharmacological fields. |
---|---|
ISSN: | 2376-5992 2376-5992 |
DOI: | 10.7717/peerj-cs.2954 |