A bi-objective hybrid optimization algorithm to reduce noise and data dimension in diabetes diagnosis using support vector machines
•The PIMA Indian Type-2 diabetes dataset is used.•Pre-processing techniques are combined together to access high-quality data.•Significant features are found using SVM.•Four bi-objective meta-heuristics are employed to maximize the accuracy and to minimize the number of selected features.•The 10-fol...
Saved in:
| Published in | Expert systems with applications Vol. 127; pp. 47 - 57 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
Elsevier Ltd
01.08.2019
Elsevier BV |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0957-4174 1873-6793 |
| DOI | 10.1016/j.eswa.2019.02.037 |
Cover
| Summary: | •The PIMA Indian Type-2 diabetes dataset is used.•Pre-processing techniques are combined together to access high-quality data.•Significant features are found using SVM.•Four bi-objective meta-heuristics are employed to maximize the accuracy and to minimize the number of selected features.•The 10-fold cross validation method is used to validate the constructed model.
Diabetes mellitus is a medical condition examined by data miners for reasons such as significant health complications in affected people, the economic impact on healthcare networks, and so on. In order to find the main causes of this disease, researchers look into the patient's lifestyle, hereditary information, etc. The goal of data mining in this context is to find patterns that make early detection of the disease and proper treatment easier. Due to the high volume of data involved in therapeutic contexts and disease diagnosis, provision of the intended treatment method become almost impossible over a short period of time. This justifies the use of pre-processing techniques and data reduction methods in such contexts. In this regard, clustering and meta-heuristic algorithms maintain important roles. In this paper, a method based on the k-means clustering algorithm is first utilized to detect and delete outliers. Then, in order to select significant and effective features, four bi-objective meta-heuristic algorithms are employed to choose the least number of significant features with the highest classification accuracy using support vector machines (SVM). In addition, the 10-fold cross validation (CV) method is used to validate the constructed model. Using real case data, it is concluded that the multi-objective firefly (MOFA) and multi-objective imperialist competitive algorithm (MOICA) with a 100% classification accuracy outperform the non-dominated sorting genetic algorithm (NSGA-II) and multi-objective particle swarm optimization (MOPSO) with the accuracies of 98.2% and 94.6%, respectively. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0957-4174 1873-6793 |
| DOI: | 10.1016/j.eswa.2019.02.037 |