Artificial intelligence diagnostic accuracy in fracture detection from plain radiographs and comparing it with clinicians: a systematic review and meta-analysis

Fracture detection is one of the most commonly used and studied aspects of artificial intelligence (AI) in medicine. In this systematic review and meta-analysis, we aimed to summarize available literature and data regarding AI performance in fracture detection on plain radiographs and various factor...

Full description

Saved in:
Bibliographic Details
Published inClinical radiology Vol. 79; no. 8; pp. 579 - 588
Main Authors Nowroozi, A., Salehi, M.A., Shobeiri, P., Agahi, S., Momtazmanesh, S., Kaviani, P., Kalra, M.K.
Format Journal Article
LanguageEnglish
Published England Elsevier Ltd 01.08.2024
Subjects
Online AccessGet full text
ISSN0009-9260
1365-229X
1365-229X
DOI10.1016/j.crad.2024.04.009

Cover

More Information
Summary:Fracture detection is one of the most commonly used and studied aspects of artificial intelligence (AI) in medicine. In this systematic review and meta-analysis, we aimed to summarize available literature and data regarding AI performance in fracture detection on plain radiographs and various factors affecting it. We systematically reviewed studies evaluating AI algorithms in detecting bone fractures in plain radiographs, combined their performance using meta-analysis (a bivariate regression approach), and compared it with that of clinicians. We also analyzed the factors potentially affecting algorithm performance using meta-regression. Our analysis included 100 studies. In 83 studies with confusion matrices, AI algorithms showed a sensitivity of 91.43% and a specificity of 92.12% (Area under the summary receiver operator curve = 0.968). After adjustment and false discovery rate correction, tibia/fibula (excluding ankle) fractures were associated with higher (7.0%, p=0.004) AI sensitivity, while more recent publications (5.5%, p=0.003) and Xception architecture (6.6%, p<0.001) were associated with higher specificity. Clinicians and AI showed similar specificity in fracture identification, although AI leaned to higher sensitivity (7.6%, p=0.07). Radiologists, on the other hand, were more specific than AI overall and in several subgroups, and more sensitive to hip fractures before FDR correction. Currently available AI aids could result in a significant improvement in care where radiologists are not readily available. Moreover, identifying factors affecting algorithm performance could guide AI development teams in their process of optimizing their products. •Studies assessing AI in fracture detection have high degrees of bias.•AI showed a pooled sensitivity and specificity of >90% in detecting fractures.•Various factors, including fracture site and architecture affect AI's performance.•Radiologists remain superior to algorithms in fracture detection on X-rays.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ObjectType-Undefined-3
ISSN:0009-9260
1365-229X
1365-229X
DOI:10.1016/j.crad.2024.04.009