Bio-inspired dimensionality reduction for Parkinson’s disease (PD) classification

Given the demand for developing the efficient Machine Learning (ML) classification models for healthcare data, and the potentiality of Bio-Inspired Optimization (BIO) algorithms to tackle the problem of high dimensional data, we investigate the range of ML classification models trained with the opti...

Full description

Saved in:

Bibliographic Details
Published in	Health information science and systems Vol. 8; no. 1; p. 13
Main Authors	Pasha, Akram, Latha, P H.
Format	Journal Article
Language	English
Published	Cham Springer International Publishing 01.12.2020 BioMed Central Ltd Springer Nature B.V
Subjects	Accuracy Algorithms Bioinformatics Biomimetics Classification Classifiers Columns (structural) Computational Biology/Bioinformatics Computer Science Data mining Datasets Decision trees Discriminant analysis Diseases Gaussian process Genetic algorithms Health care information services Health Informatics Information Systems and Communication Service Machine learning Multilayer perceptrons Optimization theory Parkinson's disease Particle swarm optimization Prejudice Radial basis function Reduction Regression analysis Support vector machines Dimensionality reduction Feature selection Genetic algorithm Machine learning Classification Binary particle swarm optimization Data mining Bio-inspired computing
Online Access	Get full text
ISSN	2047-2501 2047-2501
DOI	10.1007/s13755-020-00104-w

Cover

More Information
Summary:	Given the demand for developing the efficient Machine Learning (ML) classification models for healthcare data, and the potentiality of Bio-Inspired Optimization (BIO) algorithms to tackle the problem of high dimensional data, we investigate the range of ML classification models trained with the optimal subset of features of PD data set for efficient PD classification. We used two BIO algorithms, Genetic Algorithm (GA) and Binary Particle Swarm Optimization (BPSO), to determine the optimal subset of features of PD data set. The data set chosen for investigation comprises 756 observations (rows or records) taken over 755 attributes (columns or dimensions or features) from 252 PD patients. We employed MaxAbsolute feature scaling method to normalize the data and one hold cross-validation method to avoid biased results. Accordingly, the data is split in to training and testing set in the ratio of 70% and 30%. Subsequently, we employed GA and BPSO algorithms separately on 11 ML classifiers (Logistic Regression (LR), linear Support Vector Machine (lSVM), radial basis function Support Vector Machine (rSVM), Gaussian Naïve Bayes (GNB), Gaussian Process Classifier (GPC), k-Nearest Neighbor (kNN), Decision Tree (DT), Random Forest (RF), Multilayer Perceptron (MLP), Ada Boost (AB) and Quadratic Discriminant Analysis (QDA)), to determine the optimal subset of features (reduction of dimensionality) contributing to the highest classification accuracy. Among all the bio-inspired ML classifiers employed: GA-inspired MLP produced the maximum dimensionality reduction of 52.32% by selecting only 359 features and delivering 85.1% of the classification accuracy; GA-inspired AB delivered the maximum classification accuracy of 90.7% producing the dimensionality reduction of 41.43% by selecting only 441 features; And, BPSO-inspired GNB produced the maximum dimensionality reduction of 47.14% by selecting 396 features and delivering the classification accuracy of 79.3%; BPSOMLP delivered the maximum classification accuracy of 89% and produced 46.48% of the dimensionality reduction by selecting only 403 features.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2047-2501 2047-2501
DOI:	10.1007/s13755-020-00104-w