Lung Cancer Survival Prediction via Machine Learning Regression, Classification, and Statistical Techniques

A regression model is developed to predict survival time in months for lung cancer patients. It was previously shown that predictive models perform accurately for short survival times of less than 6 months; however, model accuracy is reduced when attempting to predict longer survival times. This stu...

Full description

Saved in:
Bibliographic Details
Published in2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) Vol. 2018; pp. 632 - 637
Main Authors Bartholomai, James A., Frieboes, Hermann B.
Format Conference Proceeding Journal Article
LanguageEnglish
Published United States IEEE 01.12.2018
Subjects
Online AccessGet full text
DOI10.1109/ISSPIT.2018.8642753

Cover

More Information
Summary:A regression model is developed to predict survival time in months for lung cancer patients. It was previously shown that predictive models perform accurately for short survival times of less than 6 months; however, model accuracy is reduced when attempting to predict longer survival times. This study employs an approach for which regression models are used in combination with a classification model to predict survival time. A set of de-identified lung cancer patient data was obtained from the Surveillance, Epidemiology, and End Results (SEER) database. The models use a subset of factors selected by ANOVA. Model accuracy is measured by a confusion matrix for classification and by Root Mean Square Error (RMSE) for regression. Random Forests are used for classification, while general Linear Regression, Gradient Boosted Machines (GBM), and Random Forests are used for regression. The regression results show that RF had the best performance for survival times ≤6 and >24 months (RMSE 10.52 and 20.51, respectively), while GBM performed best for 7-24 months (RMSE 15.65). Comparison plots of the results further indicate that the regression models perform better for shorter survival times than the RMSE values are able to reflect.
DOI:10.1109/ISSPIT.2018.8642753