A novel and efficient statistical and soft-computing intelligence integrated feature selection technique for human chronic diseases prediction
Due to the exponential increase in data volume, the widespread use of intelligent information systems has created significant obstacles and issues. High dimensionality and the existence of noisy and extraneous data are a few of the difficulties. These difficulties incur high computing costs and have...
Saved in:
| Published in | Multimedia tools and applications Vol. 84; no. 33; pp. 41853 - 41896 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
Springer US
01.10.2025
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1573-7721 1380-7501 1573-7721 |
| DOI | 10.1007/s11042-025-20707-3 |
Cover
| Summary: | Due to the exponential increase in data volume, the widespread use of intelligent information systems has created significant obstacles and issues. High dimensionality and the existence of noisy and extraneous data are a few of the difficulties. These difficulties incur high computing costs and have a considerable effect on the accuracy and efficiency of machine learning (ML) methods. A key idea used to increase classification accuracy and lower computational costs is feature selection (FS). Finding the ideal collection of features that can accurately determine class labels by removing unnecessary data is the fundamental goal of FS. However, finding an effective FS strategy is a difficult task that has given rise to a number of algorithms built using biological systems based soft computing approaches. In order to solve the difficulties faced during the FS process; this work provides a novel hybrid optimization approach that combines statistical and soft-computing intelligence. On the first dataset of diabetes disease, the suggested approach was initially tested. The approach was later tested on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset after yielding encouraging results on diabetes dataset. While finding the solution, typically, data cleaning happens at the pre-processing stage. Later on, in a series of trials, different FS methods were used separately and in hybridized fashion, such as fine-tuned statistical methods like lasso (L1 regularization) and chi-square, as well as binary Harmony search algorithm (HSA) which is based on soft computing algorithmic approach. The most efficient strategy was chosen based on the performance metric data. These FS methods pick informative features, which are then used as input for a variety of traditional ML classifiers. The chosen technique is shown along with the determined influential features and associated metric values. The success of the classifiers is then evaluated using performance metrics like accuracy, precision, F-measure, computational time, and recall. On datasets, the accuracy obtained by hybridizing the lasso technique with the HSA is highly encouraging. Our proposed hybridized approach computes astonishing results with over 99% accuracy, 98.9% F1-score, 99% AUC, 97.7% precision and 100% recall on Breast cancer dataset and 99% accuracy, 99.3% F1-score, 99%AUC, 100% precision and 98.6% recall on diabetes dataset which helps physicians make accurate diagnosis and effective treatment regimens. The key novelty of our work lies in the fusion of Lasso with HSA, resulting in a hybrid optimization technique that outperforms individual methods, other hybrid approaches, and other recent approaches mentioned in recent state-of-the-art studies
.
The experimental research shows that the suggested hybrid technique helps clinicians make well-informed judgments, precise diagnoses, and efficient treatment plans for patients, eventually saving lives. It serves as a vital second opinion for them. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1573-7721 1380-7501 1573-7721 |
| DOI: | 10.1007/s11042-025-20707-3 |