Artificial intelligence–based image analysis in clinical testing: lessons from cervical cancer screening

Abstract Novel screening and diagnostic tests based on artificial intelligence (AI) image recognition algorithms are proliferating. Some initial reports claim outstanding accuracy followed by disappointing lack of confirmation, including our own early work on cervical screening. This is a presentati...

Full description

Saved in:

Bibliographic Details
Published in	JNCI : Journal of the National Cancer Institute Vol. 116; no. 1; pp. 26 - 33
Main Authors	Egemen, Didem, Perkins, Rebecca B, Cheung, Li C, Befano, Brian, Rodriguez, Ana Cecilia, Desai, Kanan, Lemay, Andreanne, Ahmed, Syed Rakin, Antani, Sameer, Jeronimo, Jose, Wentzensen, Nicolas, Kalpathy-Cramer, Jayashree, De Sanjose, Silvia, Schiffman, Mark
Format	Journal Article
Language	English
Published	United States Oxford University Press 10.01.2024 Oxford Publishing Limited (England)
Subjects	Algorithms Artificial Intelligence Cancer screening Cervical cancer Diagnostic tests Early Detection of Cancer Epidemiology Female Humans Image analysis Image processing Image Processing, Computer-Assisted Medical imaging Medical screening Risk Uterine Cervical Neoplasms - diagnosis
Online Access	Get full text
ISSN	0027-8874 1460-2105 1460-2105
DOI	10.1093/jnci/djad202

Cover

More Information
Summary:	Abstract Novel screening and diagnostic tests based on artificial intelligence (AI) image recognition algorithms are proliferating. Some initial reports claim outstanding accuracy followed by disappointing lack of confirmation, including our own early work on cervical screening. This is a presentation of lessons learned, organized as a conceptual step-by-step approach to bridge the gap between the creation of an AI algorithm and clinical efficacy. The first fundamental principle is specifying rigorously what the algorithm is designed to identify and what the test is intended to measure (eg, screening, diagnostic, or prognostic). Second, designing the AI algorithm to minimize the most clinically important errors. For example, many equivocal cervical images cannot yet be labeled because the borderline between cases and controls is blurred. To avoid a misclassified case-control dichotomy, we have isolated the equivocal cases and formally included an intermediate, indeterminate class (severity order of classes: case>indeterminate>control). The third principle is evaluating AI algorithms like any other test, using clinical epidemiologic criteria. Repeatability of the algorithm at the borderline, for indeterminate images, has proven extremely informative. Distinguishing between internal and external validation is also essential. Linking the AI algorithm results to clinical risk estimation is the fourth principle. Absolute risk (not relative) is the critical metric for translating a test result into clinical use. Finally, generating risk-based guidelines for clinical use that match local resources and priorities is the last principle in our approach. We are particularly interested in applications to lower-resource settings to address health disparities. We note that similar principles apply to other domains of AI-based image analysis for medical diagnostic testing.
Bibliography:	SourceType-Scholarly Journals-1 ObjectType-Commentary-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	0027-8874 1460-2105 1460-2105
DOI:	10.1093/jnci/djad202