Impact of Different Mammography Systems on Artificial Intelligence Performance in Breast Cancer Screening

Artificial intelligence (AI) tools may assist breast screening mammography programs, but limited evidence supports their generalizability to new settings. This retrospective study used a 3-year dataset (April 1, 2016-March 31, 2019) from a U.K. regional screening program. The performance of a commer...

Full description

Saved in:
Bibliographic Details
Published inRadiology. Artificial intelligence Vol. 5; no. 3; p. e220146
Main Authors de Vries, Clarisse F., Colosimo, Samantha J., Staff, Roger T., Dymiter, Jaroslaw A., Yearsley, Joseph, Dinneen, Deirdre, Boyle, Moragh, Harrison, David J., Anderson, Lesley A., Lip, Gerald, Black, Corri, Murray, Alison D., Wilde, Katie, Blackwood, James D., Butterly, Claire, Zurowski, John, Eilbeck, Jon, McSkimming, Colin
Format Journal Article
LanguageEnglish
Published United States Radiological Society of North America 01.05.2023
Subjects
Online AccessGet full text
ISSN2638-6100
2638-6100
DOI10.1148/ryai.220146

Cover

More Information
Summary:Artificial intelligence (AI) tools may assist breast screening mammography programs, but limited evidence supports their generalizability to new settings. This retrospective study used a 3-year dataset (April 1, 2016-March 31, 2019) from a U.K. regional screening program. The performance of a commercially available breast screening AI algorithm was assessed with a prespecified and site-specific decision threshold to evaluate whether its performance was transferable to a new clinical site. The dataset consisted of women (aged approximately 50-70 years) who attended routine screening, excluding self-referrals, those with complex physical requirements, those who had undergone a previous mastectomy, and those who underwent screening that had technical recalls or did not have the four standard image views. In total, 55 916 screening attendees (mean age, 60 years ± 6 [SD]) met the inclusion criteria. The prespecified threshold resulted in high recall rates (48.3%, 21 929 of 45 444), which reduced to 13.0% (5896 of 45 444) following threshold calibration, closer to the observed service level (5.0%, 2774 of 55 916). Recall rates also increased approximately threefold following a software upgrade on the mammography equipment, requiring per-software version thresholds. Using software-specific thresholds, the AI algorithm would have recalled 277 of 303 (91.4%) screen-detected cancers and 47 of 138 (34.1%) interval cancers. AI performance and thresholds should be validated for new clinical settings before deployment, while quality assurance systems should monitor AI performance for consistency. Breast, Screening, Mammography, Computer Applications-Detection/Diagnosis, Neoplasms-Primary, Technology Assessment © RSNA, 2023.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Author contributions: Guarantors of integrity of entire study, C.F.d.V., R.T.S., J.A.D., L.A.A., G.L.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, C.F.d.V., R.T.S., D.D., M.B., G.L.; clinical studies, R.T.S., G.L.; experimental studies, S.J.C., R.T.S., D.J.H.; statistical analysis, C.F.d.V., S.J.C., R.T.S., J.A.D.; and manuscript editing, C.F.d.V., S.J.C., R.T.S., J.A.D., D.D., M.B., D.J.H., L.A.A., G.L.
ISSN:2638-6100
2638-6100
DOI:10.1148/ryai.220146