Enhancing Stroke Prediction with Logistic Regression and Support Vector Machine Using Oversampling Techniques

Stroke is a significant health concern that can result in both death and disability, making the early identification of risk factors crucial. Previous studies on stroke prediction have been limited by inadequate handling of class imbalance, lack of comprehensive feature selection, and parameter opti...

Full description

Saved in:
Bibliographic Details
Published inJurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) (Online) Vol. 9; no. 3; pp. 646 - 658
Main Authors Risal, Syamsul, Fajar Apriyadi, A. Sumardin, Andini Dani Achmad, Annisa Nurul Puteri
Format Journal Article
LanguageEnglish
Published Ikatan Ahli Informatika Indonesia 01.06.2025
Subjects
Online AccessGet full text
ISSN2580-0760
2580-0760
DOI10.29207/resti.v9i3.6431

Cover

More Information
Summary:Stroke is a significant health concern that can result in both death and disability, making the early identification of risk factors crucial. Previous studies on stroke prediction have been limited by inadequate handling of class imbalance, lack of comprehensive feature selection, and parameter optimization, with accuracy rates usually below 80%. This study compares the performance of Logistic Regression (LR) and Support Vector Machine (SVM) algorithms combined with different oversampling methods—SMOTE, Borderline-SMOTE, ADASYN, Random Over Sampling (ROS), and Random Under Sampling (RUS)—on a stroke prediction dataset. Correlation-based feature selection identified age, hypertension, and heart disease as significant predictors. GridSearchCV with 10-fold cross-validation was used for hyperparameter optimization, and performance was evaluated using precision, recall, accuracy, and ROC curves. The results showed that SVM significantly outperformed Logistic Regression across all sampling methods. SVM+ROS achieved the highest performance with perfect recall (100%), precision of 97.18%, and accuracy of 98.56% (AUC: 0.9857), whereas SVM + Borderline-SMOTE offered balanced performance with a recall of 94.99%, precision of 95.06%, and accuracy of 95.17% (AUC: 0.9512). LR + Borderline-SMOTE performed the best with an accuracy of 84.98% (AUC: 0.8503), significantly better than previous studies. This improved accuracy shows significant clinical benefits, potentially reducing missed stroke diagnoses by identifying thousands of additional at-risk patients in large-scale screening programs. Healthcare providers should consider implementing SVM with ROS in critical care settings, where potentially missed stroke cases have severe consequences. Simultaneously, SVM with Borderline-SMOTE may be more appropriate for resource-constrained environments.
ISSN:2580-0760
2580-0760
DOI:10.29207/resti.v9i3.6431