Performance of ChatGPT‐3.5 and ChatGPT‐4o in the Japanese National Dental Examination

Objectives: In this study, we compared the performance of ChatGPT‐3.5 to that of ChatGPT‐4o in the context of the Japanese National Dental Examination, which assesses clinical reasoning skills and dental knowledge, to determine their potential usefulness in dental education. Methods: ChatGPT's...

Full description

Saved in:
Bibliographic Details
Published inJournal of dental education Vol. 89; no. 4; pp. 459 - 466
Main Authors Uehara, Osamu, Morikawa, Tetsuro, Harada, Fumiya, Sugiyama, Nodoka, Matsuki, Yuko, Hiraki, Daichi, Sakurai, Hinako, Kado, Takashi, Yoshida, Koki, Murata, Yukie, Matsuoka, Hirofumi, Nagasawa, Toshiyuki, Furuichi, Yasushi, Abiko, Yoshihiro, Miura, Hiroko
Format Journal Article
LanguageEnglish
Published United States 01.04.2025
Subjects
Online AccessGet full text
ISSN0022-0337
1930-7837
1930-7837
DOI10.1002/jdd.13766

Cover

More Information
Summary:Objectives: In this study, we compared the performance of ChatGPT‐3.5 to that of ChatGPT‐4o in the context of the Japanese National Dental Examination, which assesses clinical reasoning skills and dental knowledge, to determine their potential usefulness in dental education. Methods: ChatGPT's performance was assessed using 1399 (55% of the exam) of 2520 questions from the Japanese National Dental Examinations (111−117). The 1121 excluded questions (45% of the exam) contained figures or tables that ChatGPT could not recognize. The questions were categorized into 18 different subjects based on dental specialty. Statistical analysis was performed using SPSS software, with McNemar's test applied to assess differences in performance. Results: A significant improvement was noted in the percentage of correct answers from ChatGPT‐4o (84.63%) compared with those from ChatGPT‐3.5 (45.46%), demonstrating enhanced reliability and subject knowledge. ChatGPT‐4o consistently outperformed ChatGPT‐3.5 across all dental subjects, with significant improvements in subjects such as oral surgery, pathology, pharmacology, and microbiology. Heatmap analysis revealed that ChatGPT‐4o provided more stable and higher correct answer rates, especially for complex subjects. Conclusions: This study found that advanced natural language processing models, such as ChatGPT‐4o, potentially have sufficiently advanced clinical reasoning skills and dental knowledge to function as a supplementary tool in dental education and exam preparation.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0022-0337
1930-7837
1930-7837
DOI:10.1002/jdd.13766