External Validation of Deep Learning Algorithms for Radiologic Diagnosis: A Systematic Review
To assess generalizability of published deep learning (DL) algorithms for radiologic diagnosis. In this systematic review, the PubMed database was searched for peer-reviewed studies of DL algorithms for image-based radiologic diagnosis that included external validation, published from January 1, 201...
Saved in:
| Published in | Radiology. Artificial intelligence Vol. 4; no. 3; p. e210064 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
United States
Radiological Society of North America
01.05.2022
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2638-6100 2638-6100 |
| DOI | 10.1148/ryai.210064 |
Cover
| Summary: | To assess generalizability of published deep learning (DL) algorithms for radiologic diagnosis.
In this systematic review, the PubMed database was searched for peer-reviewed studies of DL algorithms for image-based radiologic diagnosis that included external validation, published from January 1, 2015, through April 1, 2021. Studies using nonimaging features or incorporating non-DL methods for feature extraction or classification were excluded. Two reviewers independently evaluated studies for inclusion, and any discrepancies were resolved by consensus. Internal and external performance measures and pertinent study characteristics were extracted, and relationships among these data were examined using nonparametric statistics.
Eighty-three studies reporting 86 algorithms were included. The vast majority (70 of 86, 81%) reported at least some decrease in external performance compared with internal performance, with nearly half (42 of 86, 49%) reporting at least a modest decrease (≥0.05 on the unit scale) and nearly a quarter (21 of 86, 24%) reporting a substantial decrease (≥0.10 on the unit scale). No study characteristics were found to be associated with the difference between internal and external performance.
Among published external validation studies of DL algorithms for image-based radiologic diagnosis, the vast majority demonstrated diminished algorithm performance on the external dataset, with some reporting a substantial performance decrease.
Meta-Analysis, Computer Applications-Detection/Diagnosis, Neural Networks, Computer Applications-General (Informatics), Epidemiology, Technology Assessment, Diagnosis, Informatics
. © RSNA, 2022. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Author contributions: Guarantors of integrity of entire study, A.C.Y., J.E.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, all authors; clinical studies, A.C.Y.; statistical analysis, B.M., J.E.; and manuscript editing, all authors |
| ISSN: | 2638-6100 2638-6100 |
| DOI: | 10.1148/ryai.210064 |