A hybrid feature selection algorithm for microarray data

For each microarray data set, only a small number of genes are beneficial. Due to the high-dimensional problem, gene selection research work remains a challenge. In order to solve the high-dimensional problem, we propose a dimensionality reduction algorithm named K value maximum relevance minimum re...

Full description

Saved in:

Bibliographic Details
Published in	The Journal of supercomputing Vol. 76; no. 5; pp. 3494 - 3526
Main Authors	Zheng, Yuefeng, Li, Ying, Wang, Gang, Chen, Yupeng, Xu, Qian, Fan, Jiahao, Cui, Xueting
Format	Journal Article
Language	English
Published	New York Springer US 01.05.2020 Springer Nature B.V
Subjects	Accuracy Algorithms Classification Compilers Computer Science Datasets Genes Interpreters Processor Architectures Programming Languages Reduction Redundancy Feature selection Minimum redundancy maximum relevance Support vector machine Grey wolf optimizer Classification
Online Access	Get full text
ISSN	0920-8542 1573-0484
DOI	10.1007/s11227-018-2640-y

Cover

More Information
Summary:	For each microarray data set, only a small number of genes are beneficial. Due to the high-dimensional problem, gene selection research work remains a challenge. In order to solve the high-dimensional problem, we propose a dimensionality reduction algorithm named K value maximum relevance minimum redundancy improved grey wolf optimizer (KMR 2 IGWO). First, in the processing of KMR 2 , the K genes are selected. Second, the K genes are initialized by two ways according to random selection feature and different proportions of selection feature. Finally, the IGWO algorithm selects the optimal classification accuracy and the optimal combination of gene by adjusting the parameters of fitness function. The algorithm has a significant dimensionality reduction effect and is suitable for high-dimensional data sets. Experimental results show that the proposing KMR 2 IGWO strategy significantly reduces the dimension of microarray data and removes the redundant features. On the 14 microarray data sets, compared with the four algorithms mRMR + PSO, mRMR + GA, mRMR + BA, mRMR + CS, the proposed algorithm has higher performance in classification accuracy and feature subset length. In five data sets, the proposed algorithm average classification accuracy is 100%. On the 14 data sets, the proposed algorithm has a very significant dimensionality reduction effect, and the dimensionality reduction range is between 0.4% and 0.04%.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-018-2640-y