Evaluating CNN Architectures for the Automated Detection and Grading of Modic Changes in MRI: A Comparative Study

ABSTRACT Objective Modic changes (MCs) classification system is the most widely used method in magnetic resonance imaging (MRI) for characterizing subchondral vertebral marrow changes. However, it shows a high degree of sensitivity to variations in MRI because of its semiquantitative nature. In 2021...

Full description

Saved in:

Bibliographic Details
Published in	Orthopaedic surgery Vol. 17; no. 1; pp. 233 - 243
Main Authors	Xing, Li‐peng, Liu, Gang, Zhang, Hao‐chen, Wang, Lei, Zhu, Shan, Bao, Man Du La Hua, Wang, Yan‐ni, Chen, Chao, Wang, Zhi, Liu, Xin‐yu, Zhang, Shuai, Yang, Qiang
Format	Journal Article
Language	English
Published	Melbourne John Wiley & Sons Australia, Ltd 01.01.2025 John Wiley & Sons, Inc Wiley
Subjects	Automation Bone marrow Classification Datasets Deep Learning endplate osteochondritis Female Humans magnetic resonance imaging Magnetic Resonance Imaging - methods Male Middle Aged Modic changes Neural Networks, Computer Retrospective Studies Netherlands China Shanghai China deep learning magnetic resonance imaging Modic changes endplate osteochondritis
Online Access	Get full text
ISSN	1757-7853 1757-7861
DOI	10.1111/os.14280

Cover

More Information
Summary:	ABSTRACT Objective Modic changes (MCs) classification system is the most widely used method in magnetic resonance imaging (MRI) for characterizing subchondral vertebral marrow changes. However, it shows a high degree of sensitivity to variations in MRI because of its semiquantitative nature. In 2021, the authors of this classification system further proposed a quantitative and reliable MC grading method. However, automated tools to grade MCs are lacking. This study developed and investigated the performance of convolutional neural network (CNN) in detecting and grading MCs based on their maximum vertical extent. In order to verify performance, we tested CNNs' generalization performance, the performance of CNN with that of junior doctors, and the consistency of junior doctors after AI assistance. Methods A retrospective analysis of 139 patients' MRIs with MCs was conducted and annotated by a spine surgeon. Of the 139 patients, MRIs from 109 patients were acquired using Philips scanners from June 2020 to June 2021, constituting Dataset 1. The remaining 30 patients had MRIs obtained from both Philips and United Imaging scanners from June 2022 to March 2023, forming Dataset 2. YOLOv8 and YOLOv5 were developed in PyCharm using the Python language and based on the PyTorch deep learning framework, data enhancement and transfer learning were applied to enhance model generalization. The model's performance was compared with precision, recall, F1 score, and mAP50. It also tested generalizability and compared it with the junior doctor's performance on the second data set (Dataset 2). Post hoc, the junior doctor graded Dataset 2 with CNN assistance. In addition, the region of interest was displayed using the class activation mapping heat map. Results On the unseen test set, the YOLOv8 and YOLOv5 models achieved precision of 81.60% and 61.59%, recall of 80.90% and 67.16%, mAP50 of 84.40% and 68.88%, and F1 of 0.81 and 0.60 respectively. On Dataset 2, YOLOv8 and junior doctor achieved precision of 95.1% and 72.5%, recall of 68.3% and 60.6%. In the AI‐assisted experiment, agreement between the junior doctor and the senior spine surgeon significantly improved from Cohen's kappa of 0.368–0.681. Conclusions YOLOv8 in detecting and grading MCs was significantly superior to that of YOLOv5. The performance of YOLOv8 is superior to that of junior doctors, and it can enhance the capabilities of junior doctors and improve the reliability of diagnoses. This study developed and demonstrated that the YOLOv8 model outperformed YOLOv5 in the automated detection and grading of Modic changes (MCs) in MRI scans. YOLOv8 significantly enhanced diagnostic accuracy and efficiency, aiding junior doctors in achieving more consistent results comparable to those of senior spine surgeons. As far as we know, this study pioneers the quantitative grading of MCs using deep learning.
Bibliography:	Qiang Yang and Shuai Zhang should be considered the joint corresponding authors. This work was supported by the National Natural Science Foundation of China (52377224), the Central Guidance for Local Scientific and Technological Development Foundation (236Z7711G), the Tianjin Municipal Health Commission (TJHIA‐2023‐011), and the Tianjin Science and Technology Major Projects and “Unveiled” Major Projects (21ZXJBSY00130). Funding Li‐peng Xing and Gang Liu are as co‐first authors for this article. ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 Funding: This work was supported by the National Natural Science Foundation of China (52377224), the Central Guidance for Local Scientific and Technological Development Foundation (236Z7711G), the Tianjin Municipal Health Commission (TJHIA‐2023‐011), and the Tianjin Science and Technology Major Projects and “Unveiled” Major Projects (21ZXJBSY00130).
ISSN:	1757-7853 1757-7861
DOI:	10.1111/os.14280