COADBench: A benchmark for revealing the relationship between AI models and clinical outcomes
Alzheimer’s disease (AD), due to its irreversible nature and the severe social burden it causes, has garnered significant attention from AI researchers. Numerous auxiliary diagnostic models have been developed with the aim of improving AD diagnostic services and thereby reducing the social burden. H...
Saved in:
| Published in | BenchCouncil Transactions on Benchmarks, Standards and Evaluations Vol. 4; no. 4; p. 100198 |
|---|---|
| Main Authors | , , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
Elsevier B.V
01.12.2024
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2772-4859 2772-4859 |
| DOI | 10.1016/j.tbench.2025.100198 |
Cover
| Summary: | Alzheimer’s disease (AD), due to its irreversible nature and the severe social burden it causes, has garnered significant attention from AI researchers. Numerous auxiliary diagnostic models have been developed with the aim of improving AD diagnostic services and thereby reducing the social burden. However, due to a lack of validation regarding the clinical value of these models, no AD diagnostic model has been widely accepted by clinicians or officially approved for use in enhancing AD diagnostic services. The clinical value of traditional medical devices is validated through rigorous randomized controlled trials to prove their impact on clinical outcomes. In contrast, current AD diagnostic models are only validated based on their accuracy, and the relationship between these models and patient outcomes remains unknown. This gap has hindered the acceptance and clinical use of AD diagnostic models by healthcare professionals. To address this issue, we introduce the COADBench, a benchmark centered on clinical outcomes for evaluating the clinical value of AD diagnostic models. COADBench curated subjects from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database who have at least two cognitive score records (the most commonly used clinical endpoint in AD clinical trials) from different follow-up visits. To the best of our knowledge, for the first time, it links the cognitive scores of subjects with model performance, using patient cognitive scores as clinical outcomes after intervention to evaluate the models. Through the benchmarking of current mainstream AD diagnostic algorithms using COADBench, we find that there was no significant correlation between the subjects’ cognitive improvement and the model’s performance, which means that the current performance evaluation criteria of mainstream AD diagnostic algorithms are not combined with clinical value.
•For the first time, clinical value is considered in model evaluation.•A benchmarking framework introduced to assess clinical value of models.•The proposed framework identifies critical issues in current evaluation methods. |
|---|---|
| ISSN: | 2772-4859 2772-4859 |
| DOI: | 10.1016/j.tbench.2025.100198 |