Impact of Different Mammography Systems on Artificial Intelligence Performance in Breast Cancer Screening
Artificial intelligence (AI) tools may assist breast screening mammography programs, but limited evidence supports their generalizability to new settings. This retrospective study used a 3-year dataset (April 1, 2016-March 31, 2019) from a U.K. regional screening program. The performance of a commer...
Saved in:
| Published in | Radiology. Artificial intelligence Vol. 5; no. 3; p. e220146 |
|---|---|
| Main Authors | , , , , , , , , , , , , , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
United States
Radiological Society of North America
01.05.2023
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2638-6100 2638-6100 |
| DOI | 10.1148/ryai.220146 |
Cover
| Summary: | Artificial intelligence (AI) tools may assist breast screening mammography programs, but limited evidence supports their generalizability to new settings. This retrospective study used a 3-year dataset (April 1, 2016-March 31, 2019) from a U.K. regional screening program. The performance of a commercially available breast screening AI algorithm was assessed with a prespecified and site-specific decision threshold to evaluate whether its performance was transferable to a new clinical site. The dataset consisted of women (aged approximately 50-70 years) who attended routine screening, excluding self-referrals, those with complex physical requirements, those who had undergone a previous mastectomy, and those who underwent screening that had technical recalls or did not have the four standard image views. In total, 55 916 screening attendees (mean age, 60 years ± 6 [SD]) met the inclusion criteria. The prespecified threshold resulted in high recall rates (48.3%, 21 929 of 45 444), which reduced to 13.0% (5896 of 45 444) following threshold calibration, closer to the observed service level (5.0%, 2774 of 55 916). Recall rates also increased approximately threefold following a software upgrade on the mammography equipment, requiring per-software version thresholds. Using software-specific thresholds, the AI algorithm would have recalled 277 of 303 (91.4%) screen-detected cancers and 47 of 138 (34.1%) interval cancers. AI performance and thresholds should be validated for new clinical settings before deployment, while quality assurance systems should monitor AI performance for consistency.
Breast, Screening, Mammography, Computer Applications-Detection/Diagnosis, Neoplasms-Primary, Technology Assessment
© RSNA, 2023. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Author contributions: Guarantors of integrity of entire study, C.F.d.V., R.T.S., J.A.D., L.A.A., G.L.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, C.F.d.V., R.T.S., D.D., M.B., G.L.; clinical studies, R.T.S., G.L.; experimental studies, S.J.C., R.T.S., D.J.H.; statistical analysis, C.F.d.V., S.J.C., R.T.S., J.A.D.; and manuscript editing, C.F.d.V., S.J.C., R.T.S., J.A.D., D.D., M.B., D.J.H., L.A.A., G.L. |
| ISSN: | 2638-6100 2638-6100 |
| DOI: | 10.1148/ryai.220146 |