Performance of ChatGPT‐3.5 and ChatGPT‐4o in the Japanese National Dental Examination

Objectives: In this study, we compared the performance of ChatGPT‐3.5 to that of ChatGPT‐4o in the context of the Japanese National Dental Examination, which assesses clinical reasoning skills and dental knowledge, to determine their potential usefulness in dental education. Methods: ChatGPT's...

Full description

Saved in:

Bibliographic Details
Published in	Journal of dental education Vol. 89; no. 4; pp. 459 - 466
Main Authors	Uehara, Osamu, Morikawa, Tetsuro, Harada, Fumiya, Sugiyama, Nodoka, Matsuki, Yuko, Hiraki, Daichi, Sakurai, Hinako, Kado, Takashi, Yoshida, Koki, Murata, Yukie, Matsuoka, Hirofumi, Nagasawa, Toshiyuki, Furuichi, Yasushi, Abiko, Yoshihiro, Miura, Hiroko
Format	Journal Article
Language	English
Published	United States 01.04.2025
Subjects	artificial intelligence Chat Generative Pre‐trained Transformer (ChatGPT) Clinical Competence dental education Education, Dental - methods Educational Measurement - methods Generative Artificial Intelligence Humans Japan Japanese National Dental Examination natural language processing Japan dental education Chat Generative Pre‐trained Transformer (ChatGPT) Japanese National Dental Examination natural language processing artificial intelligence
Online Access	Get full text
ISSN	0022-0337 1930-7837 1930-7837
DOI	10.1002/jdd.13766

Cover

More Information
Summary:	Objectives: In this study, we compared the performance of ChatGPT‐3.5 to that of ChatGPT‐4o in the context of the Japanese National Dental Examination, which assesses clinical reasoning skills and dental knowledge, to determine their potential usefulness in dental education. Methods: ChatGPT's performance was assessed using 1399 (55% of the exam) of 2520 questions from the Japanese National Dental Examinations (111−117). The 1121 excluded questions (45% of the exam) contained figures or tables that ChatGPT could not recognize. The questions were categorized into 18 different subjects based on dental specialty. Statistical analysis was performed using SPSS software, with McNemar's test applied to assess differences in performance. Results: A significant improvement was noted in the percentage of correct answers from ChatGPT‐4o (84.63%) compared with those from ChatGPT‐3.5 (45.46%), demonstrating enhanced reliability and subject knowledge. ChatGPT‐4o consistently outperformed ChatGPT‐3.5 across all dental subjects, with significant improvements in subjects such as oral surgery, pathology, pharmacology, and microbiology. Heatmap analysis revealed that ChatGPT‐4o provided more stable and higher correct answer rates, especially for complex subjects. Conclusions: This study found that advanced natural language processing models, such as ChatGPT‐4o, potentially have sufficiently advanced clinical reasoning skills and dental knowledge to function as a supplementary tool in dental education and exam preparation.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0022-0337 1930-7837 1930-7837
DOI:	10.1002/jdd.13766