Artificial intelligence–based image analysis in clinical testing: lessons from cervical cancer screening

Abstract Novel screening and diagnostic tests based on artificial intelligence (AI) image recognition algorithms are proliferating. Some initial reports claim outstanding accuracy followed by disappointing lack of confirmation, including our own early work on cervical screening. This is a presentati...

Full description

Saved in:
Bibliographic Details
Published inJNCI : Journal of the National Cancer Institute Vol. 116; no. 1; pp. 26 - 33
Main Authors Egemen, Didem, Perkins, Rebecca B, Cheung, Li C, Befano, Brian, Rodriguez, Ana Cecilia, Desai, Kanan, Lemay, Andreanne, Ahmed, Syed Rakin, Antani, Sameer, Jeronimo, Jose, Wentzensen, Nicolas, Kalpathy-Cramer, Jayashree, De Sanjose, Silvia, Schiffman, Mark
Format Journal Article
LanguageEnglish
Published United States Oxford University Press 10.01.2024
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text
ISSN0027-8874
1460-2105
1460-2105
DOI10.1093/jnci/djad202

Cover

More Information
Summary:Abstract Novel screening and diagnostic tests based on artificial intelligence (AI) image recognition algorithms are proliferating. Some initial reports claim outstanding accuracy followed by disappointing lack of confirmation, including our own early work on cervical screening. This is a presentation of lessons learned, organized as a conceptual step-by-step approach to bridge the gap between the creation of an AI algorithm and clinical efficacy. The first fundamental principle is specifying rigorously what the algorithm is designed to identify and what the test is intended to measure (eg, screening, diagnostic, or prognostic). Second, designing the AI algorithm to minimize the most clinically important errors. For example, many equivocal cervical images cannot yet be labeled because the borderline between cases and controls is blurred. To avoid a misclassified case-control dichotomy, we have isolated the equivocal cases and formally included an intermediate, indeterminate class (severity order of classes: case>indeterminate>control). The third principle is evaluating AI algorithms like any other test, using clinical epidemiologic criteria. Repeatability of the algorithm at the borderline, for indeterminate images, has proven extremely informative. Distinguishing between internal and external validation is also essential. Linking the AI algorithm results to clinical risk estimation is the fourth principle. Absolute risk (not relative) is the critical metric for translating a test result into clinical use. Finally, generating risk-based guidelines for clinical use that match local resources and priorities is the last principle in our approach. We are particularly interested in applications to lower-resource settings to address health disparities. We note that similar principles apply to other domains of AI-based image analysis for medical diagnostic testing.
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Commentary-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ISSN:0027-8874
1460-2105
1460-2105
DOI:10.1093/jnci/djad202