Can a Natural Image-Based Foundation Model Outperform a Retina-Specific Model in Detecting Ocular and Systemic Diseases?

DINOv2 is a natural image-based foundation model (FM), pretrained exclusively on 142 million natural images from the LVD-142M data set. In contrast, RETFound is a retina-specific FM, pretrained on ∼3 million images, including natural images, color fundus photos, and OCT images (∼1 million each). Des...

Full description

Saved in:

Bibliographic Details
Published in	Ophthalmology science (Online) p. 100923
Main Authors	Hou, Qingshan, Zhou, Yukun, Lin Goh, Jocelyn Hui, Zou, Ke, Er Yew, Samantha Min, Srinivasan, Sahana, Wang, Meng, Lo, Thaddaeus, Lei, Xiaofeng, Wagner, Siegfried K., Chia, Mark A., Yang, Gabriel Dawei, Jiang, Hongyang, Ran, AnRan, Santos, Rui, Somfai, Gabor Mark, Zhou, Juan Helen, Chen, Haoyu, Chen, Qingyu, Yim-Lui Cheung, Carol, Keane, Pearse A., Tham, Yih Chung
Format	Journal Article
Language	English
Published	Elsevier Inc 01.08.2025
Subjects	Detection Foundation models Myocardial infarction Ocular diseases Ophthalmology Prediction of stroke Retinal images Systemic diseases Vision transformer Myocardial infarction Prediction of stroke Vision transformer Retinal images Foundation models Systemic diseases Detection Ocular diseases
Online Access	Get full text
ISSN	2666-9145 2666-9145
DOI	10.1016/j.xops.2025.100923

Cover

More Information
Summary:	DINOv2 is a natural image-based foundation model (FM), pretrained exclusively on 142 million natural images from the LVD-142M data set. In contrast, RETFound is a retina-specific FM, pretrained on ∼3 million images, including natural images, color fundus photos, and OCT images (∼1 million each). Despite DINOv2's massive pretraining data set, its application in ophthalmology and relative performance to domain-specific FMs remain understudied. To address this gap, we conducted a head-to-head comparative evaluation between DINOv2 and RETFound models across a range of downstream ocular and systemic disease tasks. Retrospective head-to-head evaluation. Ocular disease detection tasks included diabetic retinopathy (DR), glaucoma, and multiclass eye diseases, while systemic disease incidence prediction focused on the 3-year incidence of heart failure, myocardial infarction, and ischemic stroke. Eight open-source data sets (APTOS-2019, IDRID, MESSIDOR2 for DR; PAPILA, Glaucoma Fundus for glaucoma; JSIEC, Retina, OCTID for multiclass eye diseases) and the Moorfields AlzEye data set (for systemic diseases), were used for fine-tuning and internal testing. External test sets included the same open-source data sets (cross-dataset validation) and the UK Biobank (for systemic diseases). We replicated the fine-tuning methodology from the original RETFound study on 3 DINOv2 models (large, base, small). All models were fine-tuned on the respective data sets and evaluated through internal and external testing. Area under the receiver operating characteristics curve and 2-sided t-tests were used to compare models' performances. For ocular disease detection, DINOv2 models generally outperformed RETFound. For DR, DINOv2-Large achieved AUCs of 0.850 to 0.952, exceeding RETFound's 0.823 to 0.944 (all P ≤ 0.007). For multiclass eye diseases, DINOv2-large (AUC = 0.892, Retina data set) surpassed RETFound (AUC = 0.846, P < 0.001). For glaucoma, DINOv2-base (AUC = 0.958, Glaucoma Fundus) outperformed RETFound (AUC = 0.940, P < 0.001). Conversely, for systemic disease incidence prediction, RETFound achieved superior AUCs of 0.796 (heart failure), 0.732 (myocardial infarction), and 0.754 (ischemic stroke), outperforming DINOv2's best models' AUC (0.663–0.771, all P < 0.001). This trend persisted in external validation. Our findings reveal the merits of DINOv2 in ocular disease detection tasks, while RETFound demonstrates an edge in systemic disease incidence prediction. These findings showcase the distinct scenarios where general-purpose and domain-specific FMs excel, highlighting the importance of aligning FM selection with task-specific requirements to optimize clinical performance. Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
ISSN:	2666-9145 2666-9145
DOI:	10.1016/j.xops.2025.100923