Latent Feature‐Based Type 2 Diabetes Prediction Using a Hybrid Stacked Sparse Autoencoder and Machine Learning Models

ABSTRACT Early and precise prediction of Type 2 diabetes is vital for effective intervention. However, extracting meaningful insights from high‐dimensional datasets with sparse values remains challenging. Sparsity and redundant features often hinder traditional machine learning algorithms' abil...

Full description

Saved in:

Bibliographic Details
Published in	Engineering reports (Hoboken, N.J.) Vol. 7; no. 9
Main Authors	Abdussamad, Daud, Hanita, Sokkalingam, Rajalingam, Zubair, Muhammad, Khan, Iliyas Karim, Mahmood, Zafar
Format	Journal Article
Language	English
Published	Hoboken, USA John Wiley & Sons, Inc 01.09.2025 Wiley
Subjects	autoencoder hybrid model machine learning models sparse data Type 2 diabetes prediction
Online Access	Get full text
ISSN	2577-8196 2577-8196
DOI	10.1002/eng2.70358

Cover

More Information
Summary:	ABSTRACT Early and precise prediction of Type 2 diabetes is vital for effective intervention. However, extracting meaningful insights from high‐dimensional datasets with sparse values remains challenging. Sparsity and redundant features often hinder traditional machine learning algorithms' ability to identify informative patterns. While conventional Stacked Sparse Autoencoders (SSAE) can capture key features in dense data, they typically struggle with high‐dimensional sparse data, reducing classification accuracy. To address this limitation, the study proposes a Hybrid Stacked Sparse Autoencoder (HSSAE) algorithm designed for robust feature extraction and classification in sparse data environments. The architecture incorporates L1 and L2 regularization within a binary cross‐entropy loss and employs dropout and batch normalization to improve generalization and training stability. The HSSAE algorithm's performance was tested with a sigmoid classifier and various machine learning techniques. When combined with a sigmoid layer, the model achieved 89% accuracy and an F1 score of 0.89. It also outperformed baseline models when integrated with traditional classifiers; notably, the HSSAE + K‐Nearest Neighbor (KNN) achieved an F1 score of 0.91, a recall of 0.98, 90% accuracy, and the lowest hamming loss of 0.10. Comparative evaluations included baseline classifiers like Logistic Regression (LR), KNNs, Naïve Bayes (NB), AdaBoost, and XGBoost, applied directly to the preprocessed dataset. An ablation study tested these classifiers on features extracted via the SSAE. In both cases, the HSSAE algorithm showed superior performance across all metrics. These findings demonstrate the HSSAE algorithm's effectiveness in extracting discriminative features from sparse, high‐dimensional data, emphasizing its potential for clinical decision support systems requiring high accuracy and reliability. Overview of the proposed Hybrid Stacked Sparse Autoencoder (HSSAE) framework.
Bibliography:	This work was supported by the YUTP‐FRG (015LC0‐442). Funding
ISSN:	2577-8196 2577-8196
DOI:	10.1002/eng2.70358