Automated detection of spinal bone marrow oedema in axial spondyloarthritis: training and validation using two large phase 3 trial datasets

Abstract Objective To evaluate the performance of machine learning (ML) models for the automated scoring of spinal MRI bone marrow oedema (BMO) in patients with axial spondyloarthritis (axSpA) and compare them with expert scoring. Methods ML algorithms using SpineNet software were trained and valida...

Full description

Saved in:
Bibliographic Details
Published inRheumatology (Oxford, England) Vol. 64; no. 10; pp. 5446 - 5454
Main Authors Jamaludin, Amir, Windsor, Rhydian, Ather, Sarim, Kadir, Timor, Zisserman, Andrew, Braun, Juergen, Gensler, Lianne S, Østergaard, Mikkel, Poddubnyy, Denis, Coroller, Thibaud, Porter, Brian, Ligozio, Gregory, Readie, Aimee, Machado, Pedro M
Format Journal Article
LanguageEnglish
Published England Oxford University Press 01.10.2025
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text
ISSN1462-0324
1462-0332
1462-0332
DOI10.1093/rheumatology/keaf323

Cover

More Information
Summary:Abstract Objective To evaluate the performance of machine learning (ML) models for the automated scoring of spinal MRI bone marrow oedema (BMO) in patients with axial spondyloarthritis (axSpA) and compare them with expert scoring. Methods ML algorithms using SpineNet software were trained and validated on 3483 spinal MRIs from 686 axSpA patients across two clinical trial datasets. The scoring pipeline involved (i) detection and labelling of vertebral bodies and (ii) classification of vertebral units for the presence or absence of BMO. Two models were tested: Model 1, without manual segmentation, and Model 2, incorporating an intermediate manual segmentation step. Model outputs were compared with those of human experts using kappa statistics, balanced accuracy, sensitivity, specificity and AUC. Results Both models performed comparably to expert readers, regarding presence vs absence of BMO. Model 1 outperformed Model 2, with an AUC of 0.94 (vs 0.88), accuracy of 75.8% (vs 70.5%) and kappa of 0.50 (vs 0.31) using absolute reader consensus scoring as the external reference; this performance was similar to the expert inter-reader accuracy of 76.8% and kappa of 0.47 in a radiographic axSpA dataset. In a non-radiographic axSpA dataset, Model 1 achieved an AUC of 0.97 (vs 0.91 for Model 2), accuracy of 74.6% (vs 70%) and kappa of 0.52 (vs 0.27), comparable to the expert inter-reader accuracy of 74.2% and kappa of 0.46. Conclusion ML software shows potential for automated MRI BMO assessment in axSpA, offering benefits such as improved consistency, reduced labour costs and minimized inter- and intra-reader variability. Trial registration Clinicaltrials.gov, http://clinicaltrials.gov, MEASURE 1 study (NCT01358175); PREVENT study (NCT02696031).
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1462-0324
1462-0332
1462-0332
DOI:10.1093/rheumatology/keaf323