Real-world clinical impact of three commercial AI algorithms on musculoskeletal radiography interpretation: A prospective crossover reader study

•AI-assisted interpretation of musculoskeletal radiography had comparable accuracy.•AI algorithms reduced interpretation time for both readers (all tools; p < 0.001).•Reader confidence increased across tools on a 5-point Likert scale.•Additional CT recommendations decreased for one reader.•Senior...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of medical informatics (Shannon, Ireland) Vol. 205; p. 106120
Main Authors Prucker, Philipp, Lemke, Tristan, Mertens, Christian J., Ziegelmayer, Sebastian, Graf, Markus M., Weller, Dominik, Kim, Su Hwan, Gassert, Florian T., Kader, Avan, Dorfner, Felix J., Meddeb, Aymen, Makowski, Marcus R., Lammert, Jacqueline, Huber, Thomas, Lohöfer, Fabian, Bressem, Keno K., Adams, Lisa C., Luiken, Ina, Busch, Felix
Format Journal Article
LanguageEnglish
Published Ireland Elsevier B.V 01.01.2026
Subjects
Online AccessGet full text
ISSN1386-5056
1872-8243
1872-8243
DOI10.1016/j.ijmedinf.2025.106120

Cover

More Information
Summary:•AI-assisted interpretation of musculoskeletal radiography had comparable accuracy.•AI algorithms reduced interpretation time for both readers (all tools; p < 0.001).•Reader confidence increased across tools on a 5-point Likert scale.•Additional CT recommendations decreased for one reader.•Senior consults remained unchanged for both readers. To prospectively assess the diagnostic performance, workflow efficiency, and clinical impact of three commercial deep-learning tools (BoneView, Rayvolve, RBfracture) for routine musculoskeletal radiograph interpretation. From January to March 2025, two radiologists (4 and 5 years’ experience) independently interpreted 1,037 adult musculoskeletal studies (2,926 radiographs) first unaided and, after 14-day washouts, with each AI tool in a randomized crossover design. Ground truth was established by confirmatory CT when available. Outcomes included sensitivity, specificity, accuracy, area under the receiver operating characteristic curve (AUC), interpretation time, diagnostic confidence (5-point Likert), and rates of additional CT recommendations and senior consultations. DeLong tests compared AUCs; Mann–Whitney U and χ2 tests assessed secondary endpoints. AI assistance did not significantly change performance for fractures, dislocations, or effusions. For fractures, AUCs were comparable to baseline (Reader 1: 96.50 % vs. 96.30–96.50 %; Reader 2: 95.35 % vs. 95.97 %; all p > 0.11). For dislocations, baseline AUCs (Reader 1: 92.66 %; Reader 2: 90.68 %) were unchanged with AI (92.76–93.95 % and 92.00 %; p ≥ 0.280). For effusions, baseline AUCs (Reader 1: 92.52 %; Reader 2: 96.75 %) were similar with AI (93.12 % and 96.99 %; p ≥ 0.157). Median interpretation times decreased with AI (Reader 1: 34 s to 21–25 s; Reader 2: 30 s to 21–26 s; all p < 0.001). Confidence improved across tools: BoneView increased combined “very good/excellent” ratings versus unaided reads (Reader 1: 509 vs. 449, p < 0.001; Reader 2: 483 vs. 439, p < 0.001); Rayvolve (Reader 1: 456 vs. 449, p = 0.029; Reader 2: 449 vs. 439, p < 0.001) and RBfracture (Reader 1: 457 vs. 449, p = 0.017; Reader 2: 448 vs. 439, p = 0.001) yielded smaller but significant gains. Reader 1 recommended fewer CT scans with AI assistance (33 vs. 22–23, p = 0.007). In a real-world clinical setting, AI-assisted interpretation of musculoskeletal radiographs reduced reading time and increased diagnostic confidence without materially affecting diagnostic performance. These findings support AI assistance as a lever for workflow efficiency and potential cost-effectiveness at scale.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1386-5056
1872-8243
1872-8243
DOI:10.1016/j.ijmedinf.2025.106120