Commercially-available AI algorithm improves radiologists’ sensitivity for wrist and hand fracture detection on X-ray, compared to a CT-based ground truth

Objectives Algorithms for fracture detection are spreading in clinical practice, but the use of X-ray-only ground truth can induce bias in their evaluation. This study assessed radiologists’ performances to detect wrist and hand fractures on radiographs, using a commercially-available algorithm, com...

Full description

Saved in:
Bibliographic Details
Published inEuropean radiology Vol. 34; no. 5; pp. 2885 - 2894
Main Authors Jacques, Thibaut, Cardot, Nicolas, Ventre, Jeanne, Demondion, Xavier, Cotten, Anne
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.05.2024
Springer Nature B.V
Springer Verlag
Subjects
Online AccessGet full text
ISSN1432-1084
0938-7994
1432-1084
DOI10.1007/s00330-023-10380-1

Cover

More Information
Summary:Objectives Algorithms for fracture detection are spreading in clinical practice, but the use of X-ray-only ground truth can induce bias in their evaluation. This study assessed radiologists’ performances to detect wrist and hand fractures on radiographs, using a commercially-available algorithm, compared to a computerized tomography (CT) ground truth. Methods Post-traumatic hand and wrist CT and concomitant X-ray examinations were retrospectively gathered. Radiographs were labeled based on CT findings. The dataset was composed of 296 consecutive cases: 118 normal (39.9%), 178 pathological (60.1%) with a total of 267 fractures visible in CT. Twenty-three radiologists with various levels of experience reviewed all radiographs without AI, then using it, blinded towards CT results. Results Using AI improved radiologists’ sensitivity (Se, 0.658 to 0.703, p  < 0.0001) and negative predictive value (NPV, 0.585 to 0.618, p  < 0.0001), without affecting their specificity (Sp, 0.885 vs 0.891, p  = 0.91) or positive predictive value (PPV, 0.887 vs 0.899, p  = 0.08). On the radiographic dataset, based on the CT ground truth, stand-alone AI performances were 0.771 (Se), 0.898 (Sp), 0.684 (NPV), 0.915 (PPV), and 0.764 (AUROC) which were lower than previously reported, suggesting a potential underestimation of the number of missed fractures in the AI literature. Conclusions AI enabled radiologists to improve their sensitivity and negative predictive value for wrist and hand fracture detection on radiographs, without affecting their specificity or positive predictive value, compared to a CT-based ground truth. Using CT as gold standard for X-ray labels is innovative, leading to algorithm performance poorer than reported elsewhere, but probably closer to clinical reality. Clinical relevance statement Using an AI algorithm significantly improved radiologists’ sensitivity and negative predictive value in detecting wrist and hand fractures on radiographs, with ground truth labels based on CT findings. Key Points • Using CT as a ground truth for labeling X-rays is new in AI literature, and led to algorithm performance significantly poorer than reported elsewhere (AUROC: 0.764), but probably closer to clinical reality . • AI enabled radiologists to significantly improve their sensitivity ( +  4.5%) and negative predictive value ( +  3.3%) for the detection of wrist and hand fractures on X-rays . • There was no significant change in terms of specificity or positive predictive value .
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1432-1084
0938-7994
1432-1084
DOI:10.1007/s00330-023-10380-1