An efficient approach on risk factor prediction related to cardiovascular disease around Kumbakonam, Tamil Nadu, India, using unsupervised machine learning techniques
Nowadays, human beings suffer from varieties of diseases due to the environmental circumstances and their residing habits. Cardiovascular diseases (CVD) are the leading cause of mortality among all diseases. CVDs are heart-related diseases. In early days, the lack of technological advancements resul...
        Saved in:
      
    
          | Published in | Scientific reports Vol. 15; no. 1; pp. 5369 - 18 | 
|---|---|
| Main Authors | , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        London
          Nature Publishing Group UK
    
        13.02.2025
     Nature Publishing Group Nature Portfolio  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2045-2322 2045-2322  | 
| DOI | 10.1038/s41598-025-89403-4 | 
Cover
| Summary: | Nowadays, human beings suffer from varieties of diseases due to the environmental circumstances and their residing habits. Cardiovascular diseases (CVD) are the leading cause of mortality among all diseases. CVDs are heart-related diseases. In early days, the lack of technological advancements resulted in the loss of many human lives. That is, delay in diagnosis resulted in delay in treatments, which obviously becomes the reason for loss of human lives. Hence, the prediction of diseases in advance becomes an inevitability that subsequently supports in providing the necessary treatments. Thus, the present paper deals with the risk factor prediction based on unsupervised learning methods and also identifying the predominant parameters that are vital to risk factors by using principal component analysis. In this article, we have collected the patient data of size 130 × 12 from four different laboratories in and around Kumbakonam, Tamil Nadu, and India. Here, various clustering techniques like k-means clustering, partition around medoids (PAM) clustering, hierarchical clustering, and fuzzy clustering have been applied to the patient data, and the results show that data can be taken in clusters of “patients with risk” and “patients with no risk”. The optimal number of clusters is determined using elbow and silhouette methods. The efficiency of the clustering is evaluated using the Hopkins statistic, Dunn’s index, and average Silhouette widths. The agglomerative coefficients computed indicate that there is a strong cluster structure in the dataset. The stability of the clusters is tested using bootstrapping cluster analysis, and the result showed that the clusters are highly stable. We have applied feature selection using principal component analysis. Also, on applying PCA, out of 12 parameters, it is inferred that Total Cholesterol is the highly correlated factor which plays an important role in the identification of risk factors among patients. | 
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23  | 
| ISSN: | 2045-2322 2045-2322  | 
| DOI: | 10.1038/s41598-025-89403-4 |