Alzheimer-type dementia prediction by sparse logistic regression using claim data

•This study developed Alzheimer-type dementia prediction model based on health insurance claim data and long-term care claim data for Japanese elderly.•Feature selection was critical issue for utilizing claim data including a large amount of information.•Sparse logistic regression models with L0 reg...

Full description

Saved in:

Bibliographic Details
Published in	Computer methods and programs in biomedicine Vol. 196; p. 105582
Main Authors	Fukunishi, Hiroaki, Nishiyama, Mitsuki, Luo, Yuan, Kubo, Masahiro, Kobayashi, Yasuki
Format	Journal Article
Language	English
Published	Ireland Elsevier B.V 01.11.2020
Subjects	Aged Alzheimer Disease - diagnosis Alzheimer-type dementia Health insurance claim data Humans Japan Logistic Models Long-term care insurance claim data Machine Learning Prediction Sparse logistic regression Japan Long-term care insurance claim data Sparse logistic regression Health insurance claim data Alzheimer-type dementia Machine learning Prediction
Online Access	Get full text
ISSN	0169-2607 1872-7565 1872-7565
DOI	10.1016/j.cmpb.2020.105582

Cover

More Information
Summary:	•This study developed Alzheimer-type dementia prediction model based on health insurance claim data and long-term care claim data for Japanese elderly.•Feature selection was critical issue for utilizing claim data including a large amount of information.•Sparse logistic regression models with L0 regularization (SLR-L0) and L1 regularization (SLR-L1) were used for feature selection.•SLR-L0 was more effective for selecting influential features than SLR-L1. This study aimed to predict the risk of Alzheimer-type dementia for persons aged over 75 years old without receiving long-term care services using regularly collected claim data. A refined dataset including 48,123 persons was prepared from claim data of health insurance and long-term care insurance in a large city in the metropolitan area in Japan. The utilized features include the age and sex of subjects, 502 diseases based on ICD-10 diagnosis codes, and 107 prescription drugs based on therapeutic classes. The most important challenge in this work was feature selection form a large number of features. We adopted sparse logistic regression models with L0 regularization (SLR-L0) and L1 regularization (SLR-L1) as classification models based on machine learning. These regularizations enable feature selection by estimating sparse solution of non-zero coefficients in the model optimization. Predictions were performed by integrating 100 predictors trained by bootstrap samples. As a result, the area under the ROC curves (AUCs) were 0.663 for SLR-L0 and 0.660 for SLR-L1. These performances were similar, however, the average numbers of selected features were 13 out of a total of 611 for SLR-L0 and 253 for SLR-R1. The results indicate that SLR-L1 tended to include less useful features, whereas SLR-L0 narrowed down influential features. SLR-L0 might be more useful than SLR-L1 for practical use or the discussion of risk factors with medical experts.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0169-2607 1872-7565 1872-7565
DOI:	10.1016/j.cmpb.2020.105582