FG-HFS: A feature filter and group evolution hybrid feature selection algorithm for high-dimensional gene expression data

High dimensional and small samples characterize gene expression data and contain a large number of genes unrelated to disease. Feature selection improves the efficiency of disease diagnosis by selecting a small number of important genes. Unfortunately, existing algorithms do not consider the correla...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 245; p. 123069
Main Authors Xu, Zhaozhao, Yang, Fangyuan, Tang, Chaosheng, Wang, Hong, Wang, Shuihua, Sun, Junding, Zhang, Yudong
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.07.2024
Subjects
Online AccessGet full text
ISSN0957-4174
1873-6793
1873-6793
DOI10.1016/j.eswa.2023.123069

Cover

More Information
Summary:High dimensional and small samples characterize gene expression data and contain a large number of genes unrelated to disease. Feature selection improves the efficiency of disease diagnosis by selecting a small number of important genes. Unfortunately, existing algorithms do not consider the correlation between features, and search algorithms tend to fall into the local optimal solution in the feature search process. To this end, this paper proposes a feature filter and group evolution hybrid feature selection algorithm (FG-HFS) for high-dimensional gene expression data. Unlike existing algorithms, we propose using spectral clustering to group redundant features into a group. Then, we propose a redundant feature filter algorithm. According to the principle of approximate Markov blanket, grouped feature groups are filtered to delete these redundant features. Among them, filtered features are evenly divided by density according to the feature exponential strategy. Most importantly, we propose using the group evolution multi-objective genetic algorithm to search the filtered feature subsets and evaluate the candidate feature subsets according to the in-group and out-group so as to select the feature subsets with the highest accuracy and the least number. Experimental results show that the average accuracy (ACC) and Matthews correlation coefficient (MCC) indexes of the selected feature subsets (FSs) by the FG-HFS algorithm on 5 gene expression datasets are 92.76% and 88.76%, respectively, which are significantly better than the existing algorithms. In addition, the FSs and ACC/FSs indexes of the FG-HFS algorithm are also better than the existing algorithms, which fully proves the superiority of the FG-HFS algorithm. More importantly, the Wilcoxon and Friedman statistical experiments results show that the feature selection effect of FG-HFS algorithm is significantly better than that of existing algorithms, no matter in pairwise comparison or multiple comparison. •We propose using spectral clustering to group the features so that the in-group feature similarity is extremely high.•We propose a redundant feature filter algorithm since existing algorithms cannot filter redundant features.•we propose using the group evolution multi-objective genetic algorithm to search the filtered feature subsets.
ISSN:0957-4174
1873-6793
1873-6793
DOI:10.1016/j.eswa.2023.123069