A Novel Feature Selection Method on Mutual Information and Improved Gravitational Search Algorithm for High Dimensional Biomedical Data

In the past few decades, the field of bioinformatics has accumulated a large amount of gene expression data which provided important support fur the diagnosis of disease. However, high dimensionality, small sample sizes, and redundant features often adversely affect the accuracy and the speed of pre...

Full description

Saved in:
Bibliographic Details
Published in2021 13th International Conference on Computer and Automation Engineering (ICCAE) pp. 24 - 30
Main Authors Yan, Chaokun, Kang, Xi, Li, Mengyuan, Wang, Jianlin
Format Conference Proceeding
LanguageEnglish
Published IEEE 20.03.2021
Subjects
Online AccessGet full text
DOI10.1109/ICCAE51876.2021.9426130

Cover

More Information
Summary:In the past few decades, the field of bioinformatics has accumulated a large amount of gene expression data which provided important support fur the diagnosis of disease. However, high dimensionality, small sample sizes, and redundant features often adversely affect the accuracy and the speed of prediction. Existing feature selection models cannot obtain the information of these datasets accurately. Filter and wrapper are two commonly used feature selection methods. Combining the advantages of the fast calculation speed of the filter and the high accuracy of the wrapper, a new hybrid algorithm called MIIBGSA, is proposed, which hybridizes mutual information and improved Gravitational Search Algorithm (GSA). First, mutual information is used to rank and select important features, these features are further chosen into the population of the wrapper method. Then, due to the effectiveness of the GSA algorithm, GSA is adopted to further seek an optimal feature subset. However, GSA also has the disadvantages of slow search speed and premature convergence, which limit its optimization ability. In our work, a scale function is added to the speed update to enhance its search ability, and an adaptive kre,t particle update formula is proposed to improve its convergence accuracy and propose a fitness sharing strategy to enhance the randomness of particle populations and searchability through the niche algorithm of fitness sharing. We used 10fold-CV method with the K? N classifier to evaluate the classification accuracy. Experimental results on five publicly available high-dimensional biomedical data sets show that the proposed NH-LBGSA has superior performance than other algorithms.
DOI:10.1109/ICCAE51876.2021.9426130