Alzheimer-type dementia prediction by sparse logistic regression using claim data

•This study developed Alzheimer-type dementia prediction model based on health insurance claim data and long-term care claim data for Japanese elderly.•Feature selection was critical issue for utilizing claim data including a large amount of information.•Sparse logistic regression models with L0 reg...

Full description

Saved in:
Bibliographic Details
Published inComputer methods and programs in biomedicine Vol. 196; p. 105582
Main Authors Fukunishi, Hiroaki, Nishiyama, Mitsuki, Luo, Yuan, Kubo, Masahiro, Kobayashi, Yasuki
Format Journal Article
LanguageEnglish
Published Ireland Elsevier B.V 01.11.2020
Subjects
Online AccessGet full text
ISSN0169-2607
1872-7565
1872-7565
DOI10.1016/j.cmpb.2020.105582

Cover

More Information
Summary:•This study developed Alzheimer-type dementia prediction model based on health insurance claim data and long-term care claim data for Japanese elderly.•Feature selection was critical issue for utilizing claim data including a large amount of information.•Sparse logistic regression models with L0 regularization (SLR-L0) and L1 regularization (SLR-L1) were used for feature selection.•SLR-L0 was more effective for selecting influential features than SLR-L1. This study aimed to predict the risk of Alzheimer-type dementia for persons aged over 75 years old without receiving long-term care services using regularly collected claim data. A refined dataset including 48,123 persons was prepared from claim data of health insurance and long-term care insurance in a large city in the metropolitan area in Japan. The utilized features include the age and sex of subjects, 502 diseases based on ICD-10 diagnosis codes, and 107 prescription drugs based on therapeutic classes. The most important challenge in this work was feature selection form a large number of features. We adopted sparse logistic regression models with L0 regularization (SLR-L0) and L1 regularization (SLR-L1) as classification models based on machine learning. These regularizations enable feature selection by estimating sparse solution of non-zero coefficients in the model optimization. Predictions were performed by integrating 100 predictors trained by bootstrap samples. As a result, the area under the ROC curves (AUCs) were 0.663 for SLR-L0 and 0.660 for SLR-L1. These performances were similar, however, the average numbers of selected features were 13 out of a total of 611 for SLR-L0 and 253 for SLR-R1. The results indicate that SLR-L1 tended to include less useful features, whereas SLR-L0 narrowed down influential features. SLR-L0 might be more useful than SLR-L1 for practical use or the discussion of risk factors with medical experts.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0169-2607
1872-7565
1872-7565
DOI:10.1016/j.cmpb.2020.105582