Machine learning-driven risk assessment of coronary heart disease: Analysis of NHANES data from 1999 to 2018
The high incidence of coronary artery heart disease (CHD) poses a significant burden and challenge to public health systems globally. Effective prevention and early diagnosis of CHD have become key strategies to alleviate this burden. This study aims to explore the application of advanced machine le...
Saved in:
Published in | Zhong nan da xue xue bao. Journal of Central South University. Yi xue ban Vol. 49; no. 8; p. 1175 |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | Chinese English |
Published |
China
28.08.2024
|
Subjects | |
Online Access | Get full text |
ISSN | 1672-7347 |
DOI | 10.11817/j.issn.1672-7347.2024.240394 |
Cover
Abstract | The high incidence of coronary artery heart disease (CHD) poses a significant burden and challenge to public health systems globally. Effective prevention and early diagnosis of CHD have become key strategies to alleviate this burden. This study aims to explore the application of advanced machine learning techniques to enhance the accuracy of early screening and risk assessment for CHD.
A total of 49 490 study subjects from the National Health and Nutrition Examination Survey (NHANES) database spanning from 1999 to 2018 were included. The dataset was randomly divided into training (70%) and testing (30%) sets. The dependent variable (outcome variable) was whether the subjects were informed of a CHD diagnosis, categorizing them into a CHD group and a non-CHD group. We reviewed the literature on risk factors associated with CHD, ultimately including 68 independent variables. The variable characteristics of the study subjects were analyzed, comparing differences between the CHD and non-CHD groups. Machine learning algorithms, specifically random forest (randomForest_4.7-1.1) and XGBoost (xgboost_1.7.7.1) were utilized for variable selection. A comprehensive analysis of the top 10 variables identified by these 2 algorithms were conducted, selecting those mutually recognized by both. A generalized linear model was used to analyze the relationships between variables and CHD, and classical logistic regression was used to construct the CHD risk prediction model. The model's ability to distinguish between CHD and non-CHD individuals was assessed using the area under the receiver operating characteristic curve (AUC); calibration measurements were conducted with the Hosmer-Lemeshow goodness-of-fit test to evaluate the consistency between predicted values and actual CHD proportions; and decision curve analysis was applied to evaluate the clinical benefits of the model's risk prediction. Finally, a nomogram was constructed to visually present the risk scoring of the final model.
The mean age of the overall population was (49.53±18.31) years, with males comprising 51.8%. Compared to the non-CHD group, the CHD group was older [(69.05± 11.32) years vs (48.67±18.07) years,
<0.001], had a higher proportion of females (67.1% vs 47.4%,
<0.001), and exhibited statistically significant differences in classical cardiovascular risk factors such as body mass index, systolic blood pressure, diastolic blood pressure, and smoking (all
<0.001). Additionally, there were statistically significant differences in non-classical cardiovascular factors, such as energy intake, vitamins E, vitamin K, calcium, phosphorus, magnesium, zinc, copper, sodium, potassium, and selenium (all
<0.05). Six key variables most associated with CHD occurrence were ultimately identified. The CHD risk prediction model constructed was as follows: logit(p)= -7.783+0.074×age+0.003×creatinine-0.003×platelets+0.257×glycated hemoglobin+0.003× uric acid+0.101×coefficient of variation of red cell distribution width. The model demonstrated excellent discriminative ability in predicting CHD, with an accuracy of 0.712 and an AUC of 0.841. Calibration curves indicated good consistency between predicted probabilities and actual values in both the training and testing sets, demonstrating model stability and reliability. Decision curve analysis suggested that the model provided net benefits across a range of threshold probabilities, supporting its potential application in clinical decision-making.
This study successfully identified potential risk factors for CHD using machine learning techniques and developed a concise and practical clinical prediction model. Further prospective clinical cohort studies are needed to validate its potential for clinical application, enabling effective cardiovascular disease prevention and intervention strategies in real-world healthcare settings. |
---|---|
AbstractList | The high incidence of coronary artery heart disease (CHD) poses a significant burden and challenge to public health systems globally. Effective prevention and early diagnosis of CHD have become key strategies to alleviate this burden. This study aims to explore the application of advanced machine learning techniques to enhance the accuracy of early screening and risk assessment for CHD.OBJECTIVESThe high incidence of coronary artery heart disease (CHD) poses a significant burden and challenge to public health systems globally. Effective prevention and early diagnosis of CHD have become key strategies to alleviate this burden. This study aims to explore the application of advanced machine learning techniques to enhance the accuracy of early screening and risk assessment for CHD.A total of 49 490 study subjects from the National Health and Nutrition Examination Survey (NHANES) database spanning from 1999 to 2018 were included. The dataset was randomly divided into training (70%) and testing (30%) sets. The dependent variable (outcome variable) was whether the subjects were informed of a CHD diagnosis, categorizing them into a CHD group and a non-CHD group. We reviewed the literature on risk factors associated with CHD, ultimately including 68 independent variables. The variable characteristics of the study subjects were analyzed, comparing differences between the CHD and non-CHD groups. Machine learning algorithms, specifically random forest (randomForest_4.7-1.1) and XGBoost (xgboost_1.7.7.1) were utilized for variable selection. A comprehensive analysis of the top 10 variables identified by these 2 algorithms were conducted, selecting those mutually recognized by both. A generalized linear model was used to analyze the relationships between variables and CHD, and classical logistic regression was used to construct the CHD risk prediction model. The model's ability to distinguish between CHD and non-CHD individuals was assessed using the area under the receiver operating characteristic curve (AUC); calibration measurements were conducted with the Hosmer-Lemeshow goodness-of-fit test to evaluate the consistency between predicted values and actual CHD proportions; and decision curve analysis was applied to evaluate the clinical benefits of the model's risk prediction. Finally, a nomogram was constructed to visually present the risk scoring of the final model.METHODSA total of 49 490 study subjects from the National Health and Nutrition Examination Survey (NHANES) database spanning from 1999 to 2018 were included. The dataset was randomly divided into training (70%) and testing (30%) sets. The dependent variable (outcome variable) was whether the subjects were informed of a CHD diagnosis, categorizing them into a CHD group and a non-CHD group. We reviewed the literature on risk factors associated with CHD, ultimately including 68 independent variables. The variable characteristics of the study subjects were analyzed, comparing differences between the CHD and non-CHD groups. Machine learning algorithms, specifically random forest (randomForest_4.7-1.1) and XGBoost (xgboost_1.7.7.1) were utilized for variable selection. A comprehensive analysis of the top 10 variables identified by these 2 algorithms were conducted, selecting those mutually recognized by both. A generalized linear model was used to analyze the relationships between variables and CHD, and classical logistic regression was used to construct the CHD risk prediction model. The model's ability to distinguish between CHD and non-CHD individuals was assessed using the area under the receiver operating characteristic curve (AUC); calibration measurements were conducted with the Hosmer-Lemeshow goodness-of-fit test to evaluate the consistency between predicted values and actual CHD proportions; and decision curve analysis was applied to evaluate the clinical benefits of the model's risk prediction. Finally, a nomogram was constructed to visually present the risk scoring of the final model.The mean age of the overall population was (49.53±18.31) years, with males comprising 51.8%. Compared to the non-CHD group, the CHD group was older [(69.05± 11.32) years vs (48.67±18.07) years, P<0.001], had a higher proportion of females (67.1% vs 47.4%, P<0.001), and exhibited statistically significant differences in classical cardiovascular risk factors such as body mass index, systolic blood pressure, diastolic blood pressure, and smoking (all P<0.001). Additionally, there were statistically significant differences in non-classical cardiovascular factors, such as energy intake, vitamins E, vitamin K, calcium, phosphorus, magnesium, zinc, copper, sodium, potassium, and selenium (all P<0.05). Six key variables most associated with CHD occurrence were ultimately identified. The CHD risk prediction model constructed was as follows: logit(p)= -7.783+0.074×age+0.003×creatinine-0.003×platelets+0.257×glycated hemoglobin+0.003× uric acid+0.101×coefficient of variation of red cell distribution width. The model demonstrated excellent discriminative ability in predicting CHD, with an accuracy of 0.712 and an AUC of 0.841. Calibration curves indicated good consistency between predicted probabilities and actual values in both the training and testing sets, demonstrating model stability and reliability. Decision curve analysis suggested that the model provided net benefits across a range of threshold probabilities, supporting its potential application in clinical decision-making.RESULTSThe mean age of the overall population was (49.53±18.31) years, with males comprising 51.8%. Compared to the non-CHD group, the CHD group was older [(69.05± 11.32) years vs (48.67±18.07) years, P<0.001], had a higher proportion of females (67.1% vs 47.4%, P<0.001), and exhibited statistically significant differences in classical cardiovascular risk factors such as body mass index, systolic blood pressure, diastolic blood pressure, and smoking (all P<0.001). Additionally, there were statistically significant differences in non-classical cardiovascular factors, such as energy intake, vitamins E, vitamin K, calcium, phosphorus, magnesium, zinc, copper, sodium, potassium, and selenium (all P<0.05). Six key variables most associated with CHD occurrence were ultimately identified. The CHD risk prediction model constructed was as follows: logit(p)= -7.783+0.074×age+0.003×creatinine-0.003×platelets+0.257×glycated hemoglobin+0.003× uric acid+0.101×coefficient of variation of red cell distribution width. The model demonstrated excellent discriminative ability in predicting CHD, with an accuracy of 0.712 and an AUC of 0.841. Calibration curves indicated good consistency between predicted probabilities and actual values in both the training and testing sets, demonstrating model stability and reliability. Decision curve analysis suggested that the model provided net benefits across a range of threshold probabilities, supporting its potential application in clinical decision-making.This study successfully identified potential risk factors for CHD using machine learning techniques and developed a concise and practical clinical prediction model. Further prospective clinical cohort studies are needed to validate its potential for clinical application, enabling effective cardiovascular disease prevention and intervention strategies in real-world healthcare settings.CONCLUSIONSThis study successfully identified potential risk factors for CHD using machine learning techniques and developed a concise and practical clinical prediction model. Further prospective clinical cohort studies are needed to validate its potential for clinical application, enabling effective cardiovascular disease prevention and intervention strategies in real-world healthcare settings. The high incidence of coronary artery heart disease (CHD) poses a significant burden and challenge to public health systems globally. Effective prevention and early diagnosis of CHD have become key strategies to alleviate this burden. This study aims to explore the application of advanced machine learning techniques to enhance the accuracy of early screening and risk assessment for CHD. A total of 49 490 study subjects from the National Health and Nutrition Examination Survey (NHANES) database spanning from 1999 to 2018 were included. The dataset was randomly divided into training (70%) and testing (30%) sets. The dependent variable (outcome variable) was whether the subjects were informed of a CHD diagnosis, categorizing them into a CHD group and a non-CHD group. We reviewed the literature on risk factors associated with CHD, ultimately including 68 independent variables. The variable characteristics of the study subjects were analyzed, comparing differences between the CHD and non-CHD groups. Machine learning algorithms, specifically random forest (randomForest_4.7-1.1) and XGBoost (xgboost_1.7.7.1) were utilized for variable selection. A comprehensive analysis of the top 10 variables identified by these 2 algorithms were conducted, selecting those mutually recognized by both. A generalized linear model was used to analyze the relationships between variables and CHD, and classical logistic regression was used to construct the CHD risk prediction model. The model's ability to distinguish between CHD and non-CHD individuals was assessed using the area under the receiver operating characteristic curve (AUC); calibration measurements were conducted with the Hosmer-Lemeshow goodness-of-fit test to evaluate the consistency between predicted values and actual CHD proportions; and decision curve analysis was applied to evaluate the clinical benefits of the model's risk prediction. Finally, a nomogram was constructed to visually present the risk scoring of the final model. The mean age of the overall population was (49.53±18.31) years, with males comprising 51.8%. Compared to the non-CHD group, the CHD group was older [(69.05± 11.32) years vs (48.67±18.07) years, <0.001], had a higher proportion of females (67.1% vs 47.4%, <0.001), and exhibited statistically significant differences in classical cardiovascular risk factors such as body mass index, systolic blood pressure, diastolic blood pressure, and smoking (all <0.001). Additionally, there were statistically significant differences in non-classical cardiovascular factors, such as energy intake, vitamins E, vitamin K, calcium, phosphorus, magnesium, zinc, copper, sodium, potassium, and selenium (all <0.05). Six key variables most associated with CHD occurrence were ultimately identified. The CHD risk prediction model constructed was as follows: logit(p)= -7.783+0.074×age+0.003×creatinine-0.003×platelets+0.257×glycated hemoglobin+0.003× uric acid+0.101×coefficient of variation of red cell distribution width. The model demonstrated excellent discriminative ability in predicting CHD, with an accuracy of 0.712 and an AUC of 0.841. Calibration curves indicated good consistency between predicted probabilities and actual values in both the training and testing sets, demonstrating model stability and reliability. Decision curve analysis suggested that the model provided net benefits across a range of threshold probabilities, supporting its potential application in clinical decision-making. This study successfully identified potential risk factors for CHD using machine learning techniques and developed a concise and practical clinical prediction model. Further prospective clinical cohort studies are needed to validate its potential for clinical application, enabling effective cardiovascular disease prevention and intervention strategies in real-world healthcare settings. |
Author | Yang, Yanfang Xiu, Jiaming Zhu, Qifeng Wang, Jian'an Hu, Haochang Dai, Hanyi Liu, Xianbao Lu, Jin |
Author_xml | – sequence: 1 givenname: Jin surname: Lu fullname: Lu, Jin email: 12318327@zju.edu.cn, 12318327@zju.edu.cn organization: State Key Laboratory of Transvascular Implantation Devices, Hangzhou 310009. 12318327@zju.edu.cn – sequence: 2 givenname: Haochang surname: Hu fullname: Hu, Haochang organization: State Key Laboratory of Transvascular Implantation Devices, Hangzhou 310009 – sequence: 3 givenname: Jiaming surname: Xiu fullname: Xiu, Jiaming organization: Department of Cardiology, Longyan First Affiliated Hospital of Fujian Medical University, Longyan Fujian 364000 – sequence: 4 givenname: Yanfang surname: Yang fullname: Yang, Yanfang organization: Department of Cardiology, Provincial Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, Fuzhou 350001 – sequence: 5 givenname: Qifeng surname: Zhu fullname: Zhu, Qifeng organization: Binjiang Institute of Zhejiang University, Hangzhou 310053, China – sequence: 6 givenname: Hanyi surname: Dai fullname: Dai, Hanyi organization: State Key Laboratory of Transvascular Implantation Devices, Hangzhou 310009 – sequence: 7 givenname: Xianbao surname: Liu fullname: Liu, Xianbao organization: Binjiang Institute of Zhejiang University, Hangzhou 310053, China – sequence: 8 givenname: Jian'an surname: Wang fullname: Wang, Jian'an email: wangjianan111@zju.edu.cn, wangjianan111@zju.edu.cn, wangjianan111@zju.edu.cn, wangjianan111@zju.edu.cn organization: Binjiang Institute of Zhejiang University, Hangzhou 310053, China. wangjianan111@zju.edu.cn |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/39788507$$D View this record in MEDLINE/PubMed |
BookMark | eNo9kEtPwkAcxPeAEUS-gtmLiZfWfT-8EYKPBPEgd7Jtt7LY7mL_xYRvL0R0LpNMfplk5goNYooeoVtKckoN1ffbPADEnCrNMs2FzhlhImeCcCsGaPSfD9EEYEsIYZKedImG3GpjJNEj1Ly6chOix413XQzxI6u68O0j7gJ8YgfgAVofe5xqXKYuRdcd8ObI9rgK4B34BzyNrjlAgBOzfJ4u5--4cr3DdZdaTK21uE-YEWqu0UXtGvCTs4_R6nG-mj1ni7enl9l0ke2k0lmtKCdKKaFKIaljhabcW2uYYEVBDOdKsZoKbUVBC1I74aUphbfSUldUkvMxuvut3XXpa--hX7cBSt80Lvq0hzWnkluqrJFH9OaM7ovWV-tdF9rjwvXfQfwHnw9pPQ |
ContentType | Journal Article |
DBID | CGR CUY CVF ECM EIF NPM 7X8 |
DOI | 10.11817/j.issn.1672-7347.2024.240394 |
DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic MEDLINE |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
DocumentTitleAlternate | 机器学习驱动的冠心病风险评估:1999至2018年NHANES数据分析 |
ExternalDocumentID | 39788507 |
Genre | Journal Article |
GeographicLocations | United States |
GeographicLocations_xml | – name: United States |
GroupedDBID | ALMA_UNASSIGNED_HOLDINGS CGR CUY CVF ECM EIF NPM RPM 7X8 |
ID | FETCH-LOGICAL-p567-f613066646c451a2b713e998242bb0833662f14794b1b0fa4e58c4e9591abd533 |
ISSN | 1672-7347 |
IngestDate | Fri Jul 11 05:55:51 EDT 2025 Mon Jan 13 02:22:04 EST 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Issue | 8 |
Keywords | risk assessment risk factors National Health and Nutrition Examination Survey machine learning coronary artery heart disease |
Language | Chinese English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-p567-f613066646c451a2b713e998242bb0833662f14794b1b0fa4e58c4e9591abd533 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
PMID | 39788507 |
PQID | 3153916985 |
PQPubID | 23479 |
ParticipantIDs | proquest_miscellaneous_3153916985 pubmed_primary_39788507 |
PublicationCentury | 2000 |
PublicationDate | 2024-Aug-28 20240828 |
PublicationDateYYYYMMDD | 2024-08-28 |
PublicationDate_xml | – month: 08 year: 2024 text: 2024-Aug-28 day: 28 |
PublicationDecade | 2020 |
PublicationPlace | China |
PublicationPlace_xml | – name: China |
PublicationTitle | Zhong nan da xue xue bao. Journal of Central South University. Yi xue ban |
PublicationTitleAlternate | Zhong Nan Da Xue Xue Bao Yi Xue Ban |
PublicationYear | 2024 |
SSID | ssj0002511111 |
Score | 2.300478 |
Snippet | The high incidence of coronary artery heart disease (CHD) poses a significant burden and challenge to public health systems globally. Effective prevention and... |
SourceID | proquest pubmed |
SourceType | Aggregation Database Index Database |
StartPage | 1175 |
SubjectTerms | Aged Algorithms Coronary Artery Disease - diagnosis Coronary Artery Disease - epidemiology Coronary Artery Disease - etiology Coronary Disease - epidemiology Coronary Disease - etiology Female Humans Machine Learning Male Middle Aged Nutrition Surveys Risk Assessment - methods Risk Factors United States - epidemiology |
Title | Machine learning-driven risk assessment of coronary heart disease: Analysis of NHANES data from 1999 to 2018 |
URI | https://www.ncbi.nlm.nih.gov/pubmed/39788507 https://www.proquest.com/docview/3153916985 |
Volume | 49 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
journalDatabaseRights | – providerCode: PRVAQN databaseName: PubMed Central issn: 1672-7347 databaseCode: RPM dateStart: 20210101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/ omitProxy: true ssIdentifier: ssj0002511111 providerName: National Library of Medicine |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3da9RAEF_OFkpfxNKq9aOsoE8h0SS7-fCtSOUoeC-ecO3LsZvs6hWbHGcC4j_kv-nMflxCaUX7cLljk9uEnR8zv5nMzhDyGkhbxXKVh1WiechYVYdlxVkIxl0mWmfYOw2zLWbZ9As7X_DFZPJ7lLXUdzKqft26r-Q-UoUxkCvukv0PyW4nhQH4DfKFI0gYjv8k408mE1L51g9fw3qDysvmi4ttzU2bOr5pzc5b7GDd-fcyNiw4lCWZTU9nZ58DTBu1G0-wfADSUzDgxZjHXn7DHkUNKIdaBD97ZT5StFEwYrguchyYNn2jFJAouFi5PwwJQb3B06oZcGasomjN1mQ_uli568T1ahi9cEFv-Nb-YhfJSBiGZpOx8s1yYPuprcDptbMtaOpQWIxULdYYvd0GFKZy1JWZNNpOGuEtIyw_aLsqj_CxvjYAAWpWFNw24b1RhNufekB2kzzLklFYCE0--mmx6fW8vd0eeeMf5u3fHmWf7PnJ73ZyDNmZPyIPnQzpqYXcAZmo5pB8d3CjN-BGEW50gBttNfVwowZu1MHtPfVgw2ss2CiCjSLYKIKNdi1FsB2R-cez-Ydp6Jp1hGsOtlajHwquMMsqxmORyDxOFbjywAClBJqfwqLpGNsZyFi-04IpXlRMlbyMhazB53hMdpq2UU8JZbrWGmhqomoFbKkuwOXQacnTUlfgsOhj8sov0xJ0Ib7gEo1q-x_LFMw3uDtlwY_JE7t-y7Ut2rL0i_zszjPPyf6Ayhdkp9v06iUwzk6eGGn_AXsdeV0 |
linkProvider | National Library of Medicine |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Machine+learning-driven+risk+assessment+of+coronary+heart+disease%3A+Analysis+of+NHANES+data+from+1999+to+2018&rft.jtitle=Zhong+nan+da+xue+xue+bao.+Journal+of+Central+South+University.+Yi+xue+ban&rft.au=Lu%2C+Jin&rft.au=Hu%2C+Haochang&rft.au=Xiu%2C+Jiaming&rft.au=Yang%2C+Yanfang&rft.date=2024-08-28&rft.issn=1672-7347&rft.volume=49&rft.issue=8&rft.spage=1175&rft_id=info:doi/10.11817%2Fj.issn.1672-7347.2024.240394&rft_id=info%3Apmid%2F39788507&rft.externalDocID=39788507 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1672-7347&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1672-7347&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1672-7347&client=summon |