A bi-objective hybrid optimization algorithm to reduce noise and data dimension in diabetes diagnosis using support vector machines

•The PIMA Indian Type-2 diabetes dataset is used.•Pre-processing techniques are combined together to access high-quality data.•Significant features are found using SVM.•Four bi-objective meta-heuristics are employed to maximize the accuracy and to minimize the number of selected features.•The 10-fol...

Full description

Saved in:

Bibliographic Details
Published in	Expert systems with applications Vol. 127; pp. 47 - 57
Main Authors	Alirezaei, Mahsa, Niaki, Seyed Taghi Akhavan, Niaki, Seyed Armin Akhavan
Format	Journal Article
Language	English
Published	New York Elsevier Ltd 01.08.2019 Elsevier BV
Subjects	Algorithms Classification Cluster analysis Clustering Data analysis Data mining Data reduction Diabetes Diabetes diagnosis Diabetes mellitus Diagnosis Economic impact Evolutionary algorithms Feature selection Genetic algorithms Heuristic methods Impact analysis K-means algorithms Meta-heuristic algorithms Miners Multiple objective analysis Noise reduction Outliers (statistics) Particle swarm optimization Sorting algorithms Support vector machine Support vector machines Vector quantization Feature selection Support vector machine Meta-heuristic algorithms Diabetes diagnosis K-means algorithms
Online Access	Get full text
ISSN	0957-4174 1873-6793
DOI	10.1016/j.eswa.2019.02.037

Cover

More Information
Summary:	•The PIMA Indian Type-2 diabetes dataset is used.•Pre-processing techniques are combined together to access high-quality data.•Significant features are found using SVM.•Four bi-objective meta-heuristics are employed to maximize the accuracy and to minimize the number of selected features.•The 10-fold cross validation method is used to validate the constructed model. Diabetes mellitus is a medical condition examined by data miners for reasons such as significant health complications in affected people, the economic impact on healthcare networks, and so on. In order to find the main causes of this disease, researchers look into the patient's lifestyle, hereditary information, etc. The goal of data mining in this context is to find patterns that make early detection of the disease and proper treatment easier. Due to the high volume of data involved in therapeutic contexts and disease diagnosis, provision of the intended treatment method become almost impossible over a short period of time. This justifies the use of pre-processing techniques and data reduction methods in such contexts. In this regard, clustering and meta-heuristic algorithms maintain important roles. In this paper, a method based on the k-means clustering algorithm is first utilized to detect and delete outliers. Then, in order to select significant and effective features, four bi-objective meta-heuristic algorithms are employed to choose the least number of significant features with the highest classification accuracy using support vector machines (SVM). In addition, the 10-fold cross validation (CV) method is used to validate the constructed model. Using real case data, it is concluded that the multi-objective firefly (MOFA) and multi-objective imperialist competitive algorithm (MOICA) with a 100% classification accuracy outperform the non-dominated sorting genetic algorithm (NSGA-II) and multi-objective particle swarm optimization (MOPSO) with the accuracies of 98.2% and 94.6%, respectively.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2019.02.037