Evaluating ChatGPT-4o for ophthalmic image interpretation: From in-context learning to code-free clinical tool generation

Large language models (LLMs) such as ChatGPT-4o have demonstrated emerging capabilities in medical reasoning and image interpretation. However, their diagnostic applicability in ophthalmology, particularly across diverse imaging modalities, remains insufficiently characterized. This study evaluates...

Full description

Saved in:
Bibliographic Details
Published inInformatics and Health Vol. 2; no. 2; pp. 158 - 169
Main Authors Choi, Joon Yul, Yoo, Tae Keun
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.09.2025
KeAi Communications Co., Ltd
Subjects
Online AccessGet full text
ISSN2949-9534
2949-9534
DOI10.1016/j.infoh.2025.07.002

Cover

More Information
Summary:Large language models (LLMs) such as ChatGPT-4o have demonstrated emerging capabilities in medical reasoning and image interpretation. However, their diagnostic applicability in ophthalmology, particularly across diverse imaging modalities, remains insufficiently characterized. This study evaluates ChatGPT-4o’s performance in ophthalmic image interpretation, exemplar-guided reasoning (in-context learning), and code-free diagnostic tool generation using publicly available datasets. We assessed ChatGPT-4o through three clinically relevant tasks: (1) image interpretation without prior examples, using fundus, external ocular, and facial photographs representing key ophthalmic conditions; (2) in-context learning with example-based prompts to improve classification accuracy; and (3) generation of an interactive HTML-based decision-support tool from a clinical diagnostic algorithm. All evaluations were performed using open-access datasets without model fine-tuning When interpreting images without reference examples, ChatGPT-4o achieved diagnostic accuracies of 90.3 % for diabetic retinopathy, 77.4 % for age-related macular degeneration, 100 % for conjunctival melanoma, 97.3 % for pterygium, and 85.7 % for strabismus subtypes. In-context learning consistently improved diagnostic performance across all modalities, with strabismus classification reaching 100 % accuracy. Compared to EfficientNetB2, ChatGPT-4o demonstrated comparable or superior performance in several diagnostic tasks. Additionally, the model successfully translated schematic clinical algorithms into functional, browser-based diagnostic tools using natural language prompts alone. ChatGPT-4o demonstrates promise in ophthalmic image interpretation and low-code clinical tool development, particularly when guided by in-context learning. However, these findings are based on a limited diagnostic spectrum and publicly available datasets. Broader clinical validation and head-to-head comparisons with domain-specific models are needed to establish its practical utility in ophthalmology. [Display omitted] •Traditional AI for ocular disease needs high technical skill and is hard to implement in clinical settings.•ChatGPT-4o showed fair accuracy in classifying eye diseases using fundus, external, and facial images.•In-context prompts improved diagnostic accuracy over unaided interpretation without reference images.•The model built an HTML diagnostic tool from a clinical algorithm without any coding required.
ISSN:2949-9534
2949-9534
DOI:10.1016/j.infoh.2025.07.002