An improved binary particle swarm optimization algorithm for clinical cancer biomarker identification in microarray data

The limited number of samples and high-dimensional features in microarray data make selecting a small number of features for disease diagnosis a challenging problem. Traditional feature selection methods based on evolutionary algorithms are difficult to search for the optimal set of features in a li...

Full description

Saved in:
Bibliographic Details
Published inComputer methods and programs in biomedicine Vol. 244; p. 107987
Main Authors Yang, Guicheng, Li, Wei, Xie, Weidong, Wang, Linjie, Yu, Kun
Format Journal Article
LanguageEnglish
Published Ireland Elsevier B.V 01.02.2024
Subjects
Online AccessGet full text
ISSN0169-2607
1872-7565
1872-7565
DOI10.1016/j.cmpb.2023.107987

Cover

More Information
Summary:The limited number of samples and high-dimensional features in microarray data make selecting a small number of features for disease diagnosis a challenging problem. Traditional feature selection methods based on evolutionary algorithms are difficult to search for the optimal set of features in a limited time when dealing with the high-dimensional feature selection problem. New solutions are proposed to solve the above problems. In this paper, we propose a hybrid feature selection method (C-IFBPFE) for biomarker identification in microarray data, which combines clustering and improved binary particle swarm optimization while incorporating an embedded feature elimination strategy. Firstly, an adaptive redundant feature judgment method based on correlation clustering is proposed for feature screening to reduce the search space in the subsequent stage. Secondly, we propose an improved flipping probability-based binary particle swarm optimization (IFBPSO), better applicable to the binary particle swarm optimization problem. Finally, we also design a new feature elimination (FE) strategy embedded in the binary particle swarm optimization algorithm. This strategy gradually removes poorer features during iterations to reduce the number of features and improve accuracy. We compared C-IFBPFE with other published hybrid feature selection methods on eight public datasets and analyzed the impact of each improvement. The proposed method outperforms other current state-of-the-art feature selection methods in terms of accuracy, number of features, sensitivity, and specificity. The ablation study of this method validates the efficacy of each component, especially the proposed feature elimination strategy significantly improves the performance of the algorithm. The hybrid feature selection method proposed in this paper helps address the issue of high-dimensional microarray data with few samples. It can select a small subset of features and achieve high classification accuracy on microarray datasets. Additionally, independent validation of the selected features shows that those chosen by C-IFBPFE have strong correlations with disease phenotypes and can identify important biomarkers from data related to biomedical problems. •Proposing a clustering-based redundant feature judgment method.•Using flipping probabilities to update particles is more suitable in binary space.•Proposing an feature elimination embedded in PSO to identify and remove poor features.•Experiments conducted on the dataset confirm the superiority of the proposed method.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0169-2607
1872-7565
1872-7565
DOI:10.1016/j.cmpb.2023.107987