Impact of Different Mammography Systems on Artificial Intelligence Performance in Breast Cancer Screening

Artificial intelligence (AI) tools may assist breast screening mammography programs, but limited evidence supports their generalizability to new settings. This retrospective study used a 3-year dataset (April 1, 2016-March 31, 2019) from a U.K. regional screening program. The performance of a commer...

Full description

Saved in:

Bibliographic Details
Published in	Radiology. Artificial intelligence Vol. 5; no. 3; p. e220146
Main Authors	de Vries, Clarisse F., Colosimo, Samantha J., Staff, Roger T., Dymiter, Jaroslaw A., Yearsley, Joseph, Dinneen, Deirdre, Boyle, Moragh, Harrison, David J., Anderson, Lesley A., Lip, Gerald, Black, Corri, Murray, Alison D., Wilde, Katie, Blackwood, James D., Butterly, Claire, Zurowski, John, Eilbeck, Jon, McSkimming, Colin
Format	Journal Article
Language	English
Published	United States Radiological Society of North America 01.05.2023
Subjects	AI in Brief Breast Neoplasms-Primary Screening Computer Applications–Detection/Diagnosis Mammography Technology Assessment
Online Access	Get full text
ISSN	2638-6100 2638-6100
DOI	10.1148/ryai.220146

Cover

More Information
Summary:	Artificial intelligence (AI) tools may assist breast screening mammography programs, but limited evidence supports their generalizability to new settings. This retrospective study used a 3-year dataset (April 1, 2016-March 31, 2019) from a U.K. regional screening program. The performance of a commercially available breast screening AI algorithm was assessed with a prespecified and site-specific decision threshold to evaluate whether its performance was transferable to a new clinical site. The dataset consisted of women (aged approximately 50-70 years) who attended routine screening, excluding self-referrals, those with complex physical requirements, those who had undergone a previous mastectomy, and those who underwent screening that had technical recalls or did not have the four standard image views. In total, 55 916 screening attendees (mean age, 60 years ± 6 [SD]) met the inclusion criteria. The prespecified threshold resulted in high recall rates (48.3%, 21 929 of 45 444), which reduced to 13.0% (5896 of 45 444) following threshold calibration, closer to the observed service level (5.0%, 2774 of 55 916). Recall rates also increased approximately threefold following a software upgrade on the mammography equipment, requiring per-software version thresholds. Using software-specific thresholds, the AI algorithm would have recalled 277 of 303 (91.4%) screen-detected cancers and 47 of 138 (34.1%) interval cancers. AI performance and thresholds should be validated for new clinical settings before deployment, while quality assurance systems should monitor AI performance for consistency. Breast, Screening, Mammography, Computer Applications-Detection/Diagnosis, Neoplasms-Primary, Technology Assessment © RSNA, 2023.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Author contributions: Guarantors of integrity of entire study, C.F.d.V., R.T.S., J.A.D., L.A.A., G.L.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, C.F.d.V., R.T.S., D.D., M.B., G.L.; clinical studies, R.T.S., G.L.; experimental studies, S.J.C., R.T.S., D.J.H.; statistical analysis, C.F.d.V., S.J.C., R.T.S., J.A.D.; and manuscript editing, C.F.d.V., S.J.C., R.T.S., J.A.D., D.D., M.B., D.J.H., L.A.A., G.L.
ISSN:	2638-6100 2638-6100
DOI:	10.1148/ryai.220146