A problem-specific non-dominated sorting genetic algorithm for supervised feature selection

•Supervised feature selection of high-dimensional data is formulated as an MOP.•We developed a problem-specific non-dominated sorting genetic algorithm to solve the MOP.•We made a systematical comparison between our method and some state-of-the-art FS approaches. Feature selection (FS), which plays...

Full description

Saved in:
Bibliographic Details
Published inInformation sciences Vol. 547; pp. 841 - 859
Main Authors Zhou, Yu, Zhang, Wenjun, Kang, Junhao, Zhang, Xiao, Wang, Xu
Format Journal Article
LanguageEnglish
Published Elsevier Inc 08.02.2021
Subjects
Online AccessGet full text
ISSN0020-0255
1872-6291
DOI10.1016/j.ins.2020.08.083

Cover

More Information
Summary:•Supervised feature selection of high-dimensional data is formulated as an MOP.•We developed a problem-specific non-dominated sorting genetic algorithm to solve the MOP.•We made a systematical comparison between our method and some state-of-the-art FS approaches. Feature selection (FS), which plays an important role in classification tasks, has been recently studied as a multi-objective optimization problem (MOP). In this paper, we consider minimizing three objectives of FS and propose a problem-specific non-dominated sorting genetic algorithm (PS-NSGA). In PS-NSGA, an accuracy-preferred domination operator is applied, which makes the individual with higher classification accuracy in the population more likely to survive. And a quick bit mutation is used, which breaks through the limitation of traditional bit string mutation and increases the efficiency. In addition, a mutation-retry operator and a combination operator are designed to make our algorithm converge faster and better. At last, a solution selection strategy is developed to determine the most proper feature subset from the obtained Pareto solutions. Experimental results on 15 real-world high-dimensional datasets demonstrate that our proposed algorithm can achieve competitive classification accuracy while obtaining a smaller size of feature subset compared with some state-of-the-art evolutionary and traditional FS algorithms.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2020.08.083