A statistical approach to automated analysis of the low‐contrast object detectability test for the large ACR MRI phantom
Background Regular quality control checks are essential to ensure the quality of MRI systems. The American College of Radiology (ACR) has developed a standardized large phantom test protocol for this purpose. However, the ACR protocol recommends manual measurements, which are time‐consuming, labor‐i...
        Saved in:
      
    
          | Published in | Journal of applied clinical medical physics Vol. 26; no. 7; pp. e70173 - n/a | 
|---|---|
| Main Authors | , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        United States
          John Wiley & Sons, Inc
    
        01.07.2025
     John Wiley and Sons Inc  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1526-9914 1526-9914  | 
| DOI | 10.1002/acm2.70173 | 
Cover
| Summary: | Background
Regular quality control checks are essential to ensure the quality of MRI systems. The American College of Radiology (ACR) has developed a standardized large phantom test protocol for this purpose. However, the ACR protocol recommends manual measurements, which are time‐consuming, labor‐intensive, and prone to variability, impacting accuracy and reproducibility. Although some aspects of the ACR evaluation have been automated or semi‐automated, tests like low‐contrast object detectability (LCOD), remain challenging to automate. LCOD involves assessing the visibility of objects at various contrast levels.
Purpose
The purpose of this research is to propose and evaluate an automated approach for LCOD testing in MRI.
Methods
The automated Python code generates a one‐dimensional profile of image intensities along radial paths from the center of the contrast disk. These profiles are compared to templates created from the disc's geometric information using general linear model statistical tests. A total of 80 image volumes (40 T1‐ and 40 T2‐weighted) were assessed twice by two human evaluators and the proposed Python code.
Results
Human raters showed intra‐rater variability (Cohen's Kappa 0.941, 0.962), while the Python code exhibited perfect intra‐rater agreement. Inter‐rater agreement between the code and humans was comparable to human‐to‐human agreement (Cohen's Kappa 0.878 between the two human raters vs. 0.945, and 0.783 between the code and human raters). A stress test revealed both human raters and the code assigned higher scores to lower bandwidth images and lower scores to higher bandwidth images.
Conclusion
The proposed automated method eliminates intra‐rater variability and achieves strong inter‐rater agreement with human raters. These findings suggest the method is reliable and suitable for clinical settings, showing high concordance with human assessments. | 
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23  | 
| ISSN: | 1526-9914 1526-9914  | 
| DOI: | 10.1002/acm2.70173 |