Machine-learning model for predicting depression in second-hand smokers in cross-sectional data using the Korea National Health and Nutrition Examination Survey

Objective Depression among non-smokers at risk of second-hand smoke (SHS) exposure has been a neglected public health concern despite their vulnerability. The objective of this study was to develop high-performance machine-learning (ML) models for the prediction of depression in non-smokers and to i...

Full description

Saved in:
Bibliographic Details
Published inDigital health Vol. 10; p. 20552076241257046
Main Authors Kim, Na Hyun, Kim, Myeongju, Han, Jong Soo, Sohn, Hyoju, Oh, Bumjo, Lee, Ji Won, Ahn, Sumin
Format Journal Article
LanguageEnglish
Published London, England SAGE Publications 01.01.2024
Sage Publications Ltd
SAGE Publishing
Subjects
Online AccessGet full text
ISSN2055-2076
2055-2076
DOI10.1177/20552076241257046

Cover

More Information
Summary:Objective Depression among non-smokers at risk of second-hand smoke (SHS) exposure has been a neglected public health concern despite their vulnerability. The objective of this study was to develop high-performance machine-learning (ML) models for the prediction of depression in non-smokers and to identify important predictors of depression for second-hand smokers. Methods ML algorithms were created using demographic and clinical data from the Korea National Health and Nutrition Examination Survey (KNHANES) participants from 2014, 2016, and 2018 (N = 11,463). The Patient Health Questionnaire was used to diagnose depression with a total score of 10 or higher. The final model was selected according to the area under the curve (AUC) or sensitivity. Shapley additive explanations (SHAP) were used to identify influential features. Results The light gradient boosting machine (LGBM) with the highest positive predictive value (PPV; 0.646) was selected as the best model among the ML algorithms, whereas the support vector machine (SVM) had the highest AUC (0.900). The most influential factors identified using the LGBM were stress perception, followed by subjective health status and quality of life. Among the smoking-related features, urine cotinine levels were the most important, and no linear relationship existed between the smoking-related features and the values of SHAP. Conclusions Compared with the previously developed ML models, our LGBM models achieved excellent and even superior performance in predicting depression among non-smokers at risk of SHS exposure, suggesting potential goals for depression-preventive interventions for non-smokers during public health crises.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Na Hyun Kim and Myeongju Kim contributed equally to this work.
ISSN:2055-2076
2055-2076
DOI:10.1177/20552076241257046