Evaluating ChatGPT-4o for ophthalmic image interpretation: From in-context learning to code-free clinical tool generation
Large language models (LLMs) such as ChatGPT-4o have demonstrated emerging capabilities in medical reasoning and image interpretation. However, their diagnostic applicability in ophthalmology, particularly across diverse imaging modalities, remains insufficiently characterized. This study evaluates...
Saved in:
| Published in | Informatics and Health Vol. 2; no. 2; pp. 158 - 169 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
Elsevier B.V
01.09.2025
KeAi Communications Co., Ltd |
| Subjects | |
| Online Access | Get full text |
| ISSN | 2949-9534 2949-9534 |
| DOI | 10.1016/j.infoh.2025.07.002 |
Cover
| Summary: | Large language models (LLMs) such as ChatGPT-4o have demonstrated emerging capabilities in medical reasoning and image interpretation. However, their diagnostic applicability in ophthalmology, particularly across diverse imaging modalities, remains insufficiently characterized. This study evaluates ChatGPT-4o’s performance in ophthalmic image interpretation, exemplar-guided reasoning (in-context learning), and code-free diagnostic tool generation using publicly available datasets.
We assessed ChatGPT-4o through three clinically relevant tasks: (1) image interpretation without prior examples, using fundus, external ocular, and facial photographs representing key ophthalmic conditions; (2) in-context learning with example-based prompts to improve classification accuracy; and (3) generation of an interactive HTML-based decision-support tool from a clinical diagnostic algorithm. All evaluations were performed using open-access datasets without model fine-tuning
When interpreting images without reference examples, ChatGPT-4o achieved diagnostic accuracies of 90.3 % for diabetic retinopathy, 77.4 % for age-related macular degeneration, 100 % for conjunctival melanoma, 97.3 % for pterygium, and 85.7 % for strabismus subtypes. In-context learning consistently improved diagnostic performance across all modalities, with strabismus classification reaching 100 % accuracy. Compared to EfficientNetB2, ChatGPT-4o demonstrated comparable or superior performance in several diagnostic tasks. Additionally, the model successfully translated schematic clinical algorithms into functional, browser-based diagnostic tools using natural language prompts alone.
ChatGPT-4o demonstrates promise in ophthalmic image interpretation and low-code clinical tool development, particularly when guided by in-context learning. However, these findings are based on a limited diagnostic spectrum and publicly available datasets. Broader clinical validation and head-to-head comparisons with domain-specific models are needed to establish its practical utility in ophthalmology.
[Display omitted]
•Traditional AI for ocular disease needs high technical skill and is hard to implement in clinical settings.•ChatGPT-4o showed fair accuracy in classifying eye diseases using fundus, external, and facial images.•In-context prompts improved diagnostic accuracy over unaided interpretation without reference images.•The model built an HTML diagnostic tool from a clinical algorithm without any coding required. |
|---|---|
| ISSN: | 2949-9534 2949-9534 |
| DOI: | 10.1016/j.infoh.2025.07.002 |