While GPT-3.5 is unable to pass the Physician Licensing Exam in Taiwan, GPT-4 successfully meets the criteria
Background: This study investigates the performance of ChatGPT-3.5 and ChatGPT-4 in answering medical questions from Taiwan's Physician Licensing Exam, ranging from basic medical knowledge to specialized clinical topics. It aims to understand these artificial intelligence (AI) models' capa...
Saved in:
Published in | Journal of the Chinese Medical Association Vol. 88; no. 5; pp. 352 - 360 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Hagerstown, MD
Lippincott Williams & Wilkins
01.05.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 1726-4901 1728-7731 1726-4901 1728-7731 |
DOI | 10.1097/JCMA.0000000000001225 |
Cover
Summary: | Background:
This study investigates the performance of ChatGPT-3.5 and ChatGPT-4 in answering medical questions from Taiwan's Physician Licensing Exam, ranging from basic medical knowledge to specialized clinical topics. It aims to understand these artificial intelligence (AI) models' capabilities in a non-English context, specifically traditional Chinese.
Methods:
The study incorporated questions from the Taiwan Physician Licensing Exam in 2022, excluding image-based queries. Each question was manually input into ChatGPT, and responses were compared with official answers from Taiwan's Ministry of Examination. Differences across specialties and question types were assessed using the Kruskal-Wallis and Fisher's exact tests.
Results:
ChatGPT-3.5 achieved an average accuracy of 67.7% in basic medical sciences and 53.2% in clinical medicine. Meanwhile, ChatGPT-4 significantly outperformed ChatGPT-3.5, with average accuracies of 91.9% and 90.7%, respectively. ChatGPT-3.5 scored above 60.0% in seven out of 10 basic medical science subjects and three of 14 clinical subjects, while ChatGPT-4 scored above 60.0% in every subject. The type of question did not significantly affect accuracy rates.
Conclusion:
ChatGPT-3.5 showed proficiency in basic medical sciences but was less reliable in clinical medicine, whereas ChatGPT-4 demonstrated strong capabilities in both areas. However, their proficiency varied across different specialties. The type of question had minimal impact on performance. This study highlights the potential of AI models in medical education and non-English languages examination and the need for cautious and informed implementation in educational settings due to variability across specialties.
Lay summary: This study tested how well two versions of an AI chatbot, ChatGPT-3.5 and ChatGPT-4, could answer questions from Taiwan's national medical licensing exam. The exam is required for all future doctors in Taiwan and includes both basic science and clinical medicine topics. Researchers entered questions from the 2022 exam into the chatbots and compared the answers with the official ones. They found that ChatGPT-3.5 performed fairly well on basic science questions but struggled with clinical medicine. In contrast, ChatGPT-4 did much better, scoring well above the passing mark in both parts of the exam. The results suggest that newer AI models like ChatGPT-4 may be useful in medical education, especially for studying or reviewing material. However, the AI's performance still varied between different medical topics, and it sometimes made mistakes. This shows AI can help in learning, but it shouldn't replace human teachers or doctors. |
---|---|
Bibliography: | Received September 3, 2023; accepted April 13, 2024. Author contributions: Dr. Yu-Chun Chen and Dr. Tzeng-Ji Chen contributed equally to this work. Conflicts of interest: Dr. Tzeng-Ji Chen and Dr. Yu-Chun Chen, editorial board members at the Journal of the Chinese Medical Association, have no roles in the peer review process of or decision to publish this article. The other authors declare that they have no conflicts of interest related to the subject matter or materials discussed in this article. *Address correspondence. Dr. Tzeng-Ji Chen, Department of Family Medicine, Taipei Veterans General Hospital Hsinchu Branch, 81, Section 1, Zhongfeng Road, Zhudong Township, Hsinchu 310, Taiwan, ROC. E-mail address: tjchen@vhct.gov.tw (T.-J. Chen), Dr. Yu-Chun Chen, Department of Family Medicine, Taipei Veterans General Hospital Yuli Branch, 91, Xinxing Street, Yuli Township, Hualien 981, Taiwan, ROC. E-mail address: yuchn.chen@gmail.com (Y.-C. Chen). ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1726-4901 1728-7731 1726-4901 1728-7731 |
DOI: | 10.1097/JCMA.0000000000001225 |