From Language Models to Medical Diagnoses: Assessing the Potential of GPT-4 and GPT-3.5-Turbo in Digital Health
Background: Large language models (LLMs) like GPT-3.5-Turbo and GPT-4 show potential to transform medical diagnostics through their linguistic and analytical capabilities. This study evaluates their diagnostic proficiency using English and German medical examination datasets. Methods: We analyzed 45...
Saved in:
Published in | AI (Basel) Vol. 5; no. 4; pp. 2680 - 2692 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Basel
MDPI AG
01.12.2024
|
Subjects | |
Online Access | Get full text |
ISSN | 2673-2688 2673-2688 |
DOI | 10.3390/ai5040128 |
Cover
Summary: | Background: Large language models (LLMs) like GPT-3.5-Turbo and GPT-4 show potential to transform medical diagnostics through their linguistic and analytical capabilities. This study evaluates their diagnostic proficiency using English and German medical examination datasets. Methods: We analyzed 452 English and 637 German medical examination questions using GPT models. Performance metrics included broad and exact accuracy rates for primary and three-model generated guesses, with an analysis of performance against varying question difficulties based on student accuracy rates. Results: GPT-4 demonstrated superior performance, achieving up to 95.4% accuracy when considering approximate similarity in English datasets. While GPT-3.5-Turbo showed better results in English, GPT-4 maintained consistent performance across both languages. Question difficulty was correlated with diagnostic accuracy, particularly in German datasets. Conclusions: The study demonstrates GPT-4’s significant diagnostic capabilities and cross-linguistic flexibility, suggesting potential for clinical applications. However, further validation and ethical consideration are necessary before widespread implementation. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 2673-2688 2673-2688 |
DOI: | 10.3390/ai5040128 |