Latent Feature‐Based Type 2 Diabetes Prediction Using a Hybrid Stacked Sparse Autoencoder and Machine Learning Models
ABSTRACT Early and precise prediction of Type 2 diabetes is vital for effective intervention. However, extracting meaningful insights from high‐dimensional datasets with sparse values remains challenging. Sparsity and redundant features often hinder traditional machine learning algorithms' abil...
Saved in:
| Published in | Engineering reports (Hoboken, N.J.) Vol. 7; no. 9 |
|---|---|
| Main Authors | , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
Hoboken, USA
John Wiley & Sons, Inc
01.09.2025
Wiley |
| Subjects | |
| Online Access | Get full text |
| ISSN | 2577-8196 2577-8196 |
| DOI | 10.1002/eng2.70358 |
Cover
| Summary: | ABSTRACT
Early and precise prediction of Type 2 diabetes is vital for effective intervention. However, extracting meaningful insights from high‐dimensional datasets with sparse values remains challenging. Sparsity and redundant features often hinder traditional machine learning algorithms' ability to identify informative patterns. While conventional Stacked Sparse Autoencoders (SSAE) can capture key features in dense data, they typically struggle with high‐dimensional sparse data, reducing classification accuracy. To address this limitation, the study proposes a Hybrid Stacked Sparse Autoencoder (HSSAE) algorithm designed for robust feature extraction and classification in sparse data environments. The architecture incorporates L1 and L2 regularization within a binary cross‐entropy loss and employs dropout and batch normalization to improve generalization and training stability. The HSSAE algorithm's performance was tested with a sigmoid classifier and various machine learning techniques. When combined with a sigmoid layer, the model achieved 89% accuracy and an F1 score of 0.89. It also outperformed baseline models when integrated with traditional classifiers; notably, the HSSAE + K‐Nearest Neighbor (KNN) achieved an F1 score of 0.91, a recall of 0.98, 90% accuracy, and the lowest hamming loss of 0.10. Comparative evaluations included baseline classifiers like Logistic Regression (LR), KNNs, Naïve Bayes (NB), AdaBoost, and XGBoost, applied directly to the preprocessed dataset. An ablation study tested these classifiers on features extracted via the SSAE. In both cases, the HSSAE algorithm showed superior performance across all metrics. These findings demonstrate the HSSAE algorithm's effectiveness in extracting discriminative features from sparse, high‐dimensional data, emphasizing its potential for clinical decision support systems requiring high accuracy and reliability.
Overview of the proposed Hybrid Stacked Sparse Autoencoder (HSSAE) framework. |
|---|---|
| Bibliography: | This work was supported by the YUTP‐FRG (015LC0‐442). Funding |
| ISSN: | 2577-8196 2577-8196 |
| DOI: | 10.1002/eng2.70358 |