Exploration and analysis of risk factors for coronary artery disease with type 2 diabetes based on SHAP explainable machine learning algorithm

T2DM is a major risk factor for CHD. In recent years, machine learning algorithms have demonstrated significant advantages in improving predictive accuracy; however, studies applying these methods for clinical prediction and diagnosis of CHD-DM2 remain limited. This study aims to evaluate the perfor...

Full description

Saved in:
Bibliographic Details
Published inScientific reports Vol. 15; no. 1; pp. 29521 - 19
Main Authors Tang, Dandan, Liang, Fengwei, Gu, Xingli, Jin, Yuanyuan, Hu, Xuanjie, Liu, Fen, Yang, Yining
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 12.08.2025
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text
ISSN2045-2322
2045-2322
DOI10.1038/s41598-025-11142-3

Cover

More Information
Summary:T2DM is a major risk factor for CHD. In recent years, machine learning algorithms have demonstrated significant advantages in improving predictive accuracy; however, studies applying these methods for clinical prediction and diagnosis of CHD-DM2 remain limited. This study aims to evaluate the performance of machine learning models and to develop an interpretable model to identify critical risk factors of CHD-DM2, thereby supporting clinical decision-making. Data were collected from cardiovascular inpatients admitted to the First Affiliated Hospital of Xinjiang Medical University between 2001 and 2018. A total of 12,400 patients were included, comprising 10,257 cases of CHD and 2143 cases of CHD-DM2.To address the class imbalance in the dataset, the SMOTENC algorithm was applied in conjunction with the themis package for data preprocessing. Final predictors were identified through a combined approach of univariate analysis and Lasso regression. We then developed and validated seven machine learning models: Logistic, Logistic_Lasso, KNN, SVM, XGBoost, RF, and LightGBM. The predictive performance of the five models was compared using evaluation metrics including accuracy, sensitivity, specificity, AUC, ROC and DCA. Additionally, SHAP values were employed to provide interpretability of the model outputs. The dataset was split into a training set (n = 8460) and a validation set (n = 3680) at a 7:3 ratio. A total of 25 predictive variables were ultimately identified through Lasso regression analysis. Among the seven machine learning models, the RF model demonstrated significantly superior performance and achieved the highest net benefit in the DCA. According to SHAP analysis, Diabetes.History, BG, and HbA1c were identified as the top contributors to CHD-DM2 risk. This study identified Diabetes.History, blood glucose (BG), and HbA1c as the primary risk factors for CHD-DM2. It is recommended that hospitals enhance monitoring of such patients, document the presence of high-risk factors, and implement targeted intervention strategies accordingly.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-025-11142-3