Real-world clinical impact of three commercial AI algorithms on musculoskeletal radiography interpretation: A prospective crossover reader study
•AI-assisted interpretation of musculoskeletal radiography had comparable accuracy.•AI algorithms reduced interpretation time for both readers (all tools; p < 0.001).•Reader confidence increased across tools on a 5-point Likert scale.•Additional CT recommendations decreased for one reader.•Senior...
        Saved in:
      
    
          | Published in | International journal of medical informatics (Shannon, Ireland) Vol. 205; p. 106120 | 
|---|---|
| Main Authors | , , , , , , , , , , , , , , , , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        Ireland
          Elsevier B.V
    
        01.01.2026
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1386-5056 1872-8243 1872-8243  | 
| DOI | 10.1016/j.ijmedinf.2025.106120 | 
Cover
| Summary: | •AI-assisted interpretation of musculoskeletal radiography had comparable accuracy.•AI algorithms reduced interpretation time for both readers (all tools; p < 0.001).•Reader confidence increased across tools on a 5-point Likert scale.•Additional CT recommendations decreased for one reader.•Senior consults remained unchanged for both readers.
To prospectively assess the diagnostic performance, workflow efficiency, and clinical impact of three commercial deep-learning tools (BoneView, Rayvolve, RBfracture) for routine musculoskeletal radiograph interpretation.
From January to March 2025, two radiologists (4 and 5 years’ experience) independently interpreted 1,037 adult musculoskeletal studies (2,926 radiographs) first unaided and, after 14-day washouts, with each AI tool in a randomized crossover design. Ground truth was established by confirmatory CT when available. Outcomes included sensitivity, specificity, accuracy, area under the receiver operating characteristic curve (AUC), interpretation time, diagnostic confidence (5-point Likert), and rates of additional CT recommendations and senior consultations. DeLong tests compared AUCs; Mann–Whitney U and χ2 tests assessed secondary endpoints.
AI assistance did not significantly change performance for fractures, dislocations, or effusions. For fractures, AUCs were comparable to baseline (Reader 1: 96.50 % vs. 96.30–96.50 %; Reader 2: 95.35 % vs. 95.97 %; all p > 0.11). For dislocations, baseline AUCs (Reader 1: 92.66 %; Reader 2: 90.68 %) were unchanged with AI (92.76–93.95 % and 92.00 %; p ≥ 0.280). For effusions, baseline AUCs (Reader 1: 92.52 %; Reader 2: 96.75 %) were similar with AI (93.12 % and 96.99 %; p ≥ 0.157). Median interpretation times decreased with AI (Reader 1: 34 s to 21–25 s; Reader 2: 30 s to 21–26 s; all p < 0.001). Confidence improved across tools: BoneView increased combined “very good/excellent” ratings versus unaided reads (Reader 1: 509 vs. 449, p < 0.001; Reader 2: 483 vs. 439, p < 0.001); Rayvolve (Reader 1: 456 vs. 449, p = 0.029; Reader 2: 449 vs. 439, p < 0.001) and RBfracture (Reader 1: 457 vs. 449, p = 0.017; Reader 2: 448 vs. 439, p = 0.001) yielded smaller but significant gains. Reader 1 recommended fewer CT scans with AI assistance (33 vs. 22–23, p = 0.007).
In a real-world clinical setting, AI-assisted interpretation of musculoskeletal radiographs reduced reading time and increased diagnostic confidence without materially affecting diagnostic performance. These findings support AI assistance as a lever for workflow efficiency and potential cost-effectiveness at scale. | 
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23  | 
| ISSN: | 1386-5056 1872-8243 1872-8243  | 
| DOI: | 10.1016/j.ijmedinf.2025.106120 |