Reproducible Radiomics Features from Multi‐MRI‐Scanner Test–Retest‐Study: Influence on Performance and Generalizability of Models

Background Radiomics models trained on data from one center typically show a decline of performance when applied to data from external centers, hindering their introduction into large‐scale clinical practice. Current expert recommendations suggest to use only reproducible radiomics features isolated...

Full description

Saved in:

Bibliographic Details
Published in	Journal of magnetic resonance imaging Vol. 61; no. 2; pp. 676 - 686
Main Authors	Wennmann, Markus, Rotkopf, Lukas T., Bauer, Fabian, Hielscher, Thomas, Kächele, Jessica, Mai, Elias K., Weinhold, Niels, Raab, Marc‐Steffen, Goldschmidt, Hartmut, Weber, Tim F., Schlemmer, Heinz‐Peter, Delorme, Stefan, Maier‐Hein, Klaus, Neher, Peter
Format	Journal Article
Language	English
Published	Hoboken, USA John Wiley & Sons, Inc 01.02.2025 Wiley Subscription Services, Inc
Subjects	Adult Aged Algorithms Biopsy Bone marrow Bone Marrow - diagnostic imaging Cell culture Correlation coefficient Correlation coefficients feature selection Female Field strength generalizability Humans Image Interpretation, Computer-Assisted - methods Image Processing, Computer-Assisted - methods In vivo methods and tests Infiltration Learning algorithms Machine Learning Magnetic resonance imaging Magnetic Resonance Imaging - methods Male Middle Aged multicenter Population studies Radiomics Rank tests reproducibility Reproducibility of Results Retrospective Studies Scanners Statistical analysis Statistical tests Test sets generalizability radiomics machine learning reproducibility feature selection multicenter
Online Access	Get full text
ISSN	1053-1807 1522-2586 1522-2586
DOI	10.1002/jmri.29442

Cover

More Information
Summary:	Background Radiomics models trained on data from one center typically show a decline of performance when applied to data from external centers, hindering their introduction into large‐scale clinical practice. Current expert recommendations suggest to use only reproducible radiomics features isolated by multiscanner test–retest experiments, which might help to overcome the problem of limited generalizability to external data. Purpose To evaluate the influence of using only a subset of robust radiomics features, defined in a prior in vivo multi‐MRI‐scanner test–retest‐study, on the performance and generalizability of radiomics models. Study Type Retrospective. Population Patients with monoclonal plasma cell disorders. Training set (117 MRIs from center 1); internal test set (42 MRIs from center 1); external test set (143 MRIs from center 2–8). Field Strength/Sequence 1.5T and 3.0T; T1‐weighted turbo spin echo. Assessment The task for the radiomics models was to predict plasma cell infiltration, determined by bone marrow biopsy, noninvasively from MRI. Radiomics machine learning models, including linear regressor, support vector regressor (SVR), and random forest regressor (RFR), were trained on data from center 1, using either all radiomics features, or using only reproducible radiomics features. Models were tested on an internal (center 1) and a multicentric external data set (center 2–8). Statistical Tests Pearson correlation coefficient r and mean absolute error (MAE) between predicted and actual plasma cell infiltration. Fisher's z‐transformation, Wilcoxon signed‐rank test, Wilcoxon rank‐sum test; significance level P < 0.05. Results When using only reproducible features compared with all features, the performance of the SVR on the external test set significantly improved (r = 0.43 vs. r = 0.18 and MAE = 22.6 vs. MAE = 28.2). For the RFR, the performance on the external test set deteriorated when using only reproducible instead of all radiomics features (r = 0.33 vs. r = 0.44, P = 0.29 and MAE = 21.9 vs. MAE = 20.5, P = 0.10). Conclusion Using only reproducible radiomics features improves the external performance of some, but not all machine learning models, and did not automatically lead to an improvement of the external performance of the overall best radiomics model. Level of Evidence 3. Technical Efficacy Stage 2.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1053-1807 1522-2586 1522-2586
DOI:	10.1002/jmri.29442