Active Semi-Supervised Learning via Bayesian Experimental Design for Lung Cancer Classification Using Low Dose Computed Tomography Scans

We introduce an active, semisupervised algorithm that utilizes Bayesian experimental design to address the shortage of annotated images required to train and validate Artificial Intelligence (AI) models for lung cancer screening with computed tomography (CT) scans. Our approach incorporates active l...

Full description

Saved in:

Bibliographic Details
Published in	Applied sciences Vol. 13; no. 6; p. 3752
Main Authors	Nguyen, Phuong, Rathod, Ankita, Chapman, David, Prathapan, Smriti, Menon, Sumeet, Morris, Michael, Yesha, Yelena
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.03.2023
Subjects	Active learning Algorithms Analysis Artificial intelligence Biopsy Cancer Classification computer-aided diagnosis CT imaging Data mining Datasets Deep learning Diagnosis Diagnostic imaging expectation maximization Labeling Lung cancer lung cancer screening Medical imaging Medical imaging equipment Medical screening Neural networks Oncology, Experimental Radiomics Semantics Tomography Taiwan
Online Access	Get full text
ISSN	2076-3417 2076-3417
DOI	10.3390/app13063752

Cover

More Information
Summary:	We introduce an active, semisupervised algorithm that utilizes Bayesian experimental design to address the shortage of annotated images required to train and validate Artificial Intelligence (AI) models for lung cancer screening with computed tomography (CT) scans. Our approach incorporates active learning with semisupervised expectation maximization to emulate the human in the loop for additional ground truth labels to train, evaluate, and update the neural network models. Bayesian experimental design is used to intelligently identify which unlabeled samples need ground truth labels to enhance the model’s performance. We evaluate the proposed Active Semi-supervised Expectation Maximization for Computer aided diagnosis (CAD) tasks (ASEM-CAD) using three public CT scans datasets: the National Lung Screening Trial (NLST), the Lung Image Database Consortium (LIDC), and Kaggle Data Science Bowl 2017 for lung cancer classification using CT scans. ASEM-CAD can accurately classify suspicious lung nodules and lung cancer cases with an area under the curve (AUC) of 0.94 (Kaggle), 0.95 (NLST), and 0.88 (LIDC) with significantly fewer labeled images compared to a fully supervised model. This study addresses one of the significant challenges in early lung cancer screenings using low-dose computed tomography (LDCT) scans and is a valuable contribution towards the development and validation of deep learning algorithms for lung cancer screening and other diagnostic radiology examinations.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app13063752