COADBench: A benchmark for revealing the relationship between AI models and clinical outcomes

Alzheimer’s disease (AD), due to its irreversible nature and the severe social burden it causes, has garnered significant attention from AI researchers. Numerous auxiliary diagnostic models have been developed with the aim of improving AD diagnostic services and thereby reducing the social burden. H...

Full description

Saved in:
Bibliographic Details
Published inBenchCouncil Transactions on Benchmarks, Standards and Evaluations Vol. 4; no. 4; p. 100198
Main Authors Xie, Jiyue, Liu, Wenjing, Ma, Li, Yao, Caiqin, Liang, Qi, Tang, Suqin, Huang, Yunyou
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.12.2024
Subjects
Online AccessGet full text
ISSN2772-4859
2772-4859
DOI10.1016/j.tbench.2025.100198

Cover

More Information
Summary:Alzheimer’s disease (AD), due to its irreversible nature and the severe social burden it causes, has garnered significant attention from AI researchers. Numerous auxiliary diagnostic models have been developed with the aim of improving AD diagnostic services and thereby reducing the social burden. However, due to a lack of validation regarding the clinical value of these models, no AD diagnostic model has been widely accepted by clinicians or officially approved for use in enhancing AD diagnostic services. The clinical value of traditional medical devices is validated through rigorous randomized controlled trials to prove their impact on clinical outcomes. In contrast, current AD diagnostic models are only validated based on their accuracy, and the relationship between these models and patient outcomes remains unknown. This gap has hindered the acceptance and clinical use of AD diagnostic models by healthcare professionals. To address this issue, we introduce the COADBench, a benchmark centered on clinical outcomes for evaluating the clinical value of AD diagnostic models. COADBench curated subjects from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database who have at least two cognitive score records (the most commonly used clinical endpoint in AD clinical trials) from different follow-up visits. To the best of our knowledge, for the first time, it links the cognitive scores of subjects with model performance, using patient cognitive scores as clinical outcomes after intervention to evaluate the models. Through the benchmarking of current mainstream AD diagnostic algorithms using COADBench, we find that there was no significant correlation between the subjects’ cognitive improvement and the model’s performance, which means that the current performance evaluation criteria of mainstream AD diagnostic algorithms are not combined with clinical value. •For the first time, clinical value is considered in model evaluation.•A benchmarking framework introduced to assess clinical value of models.•The proposed framework identifies critical issues in current evaluation methods.
ISSN:2772-4859
2772-4859
DOI:10.1016/j.tbench.2025.100198