Feature selection using binary horse herd optimization algorithm with lightGBA ensemble classification in microarray data
Data analysis presents significant challenges due to its high dimensionality, imbalanced distribution, and complexity. Traditional feature selection methods often fall short of addressing these challenges effectively. In response, this research proposes a novel hybrid methodology that integrates mul...
        Saved in:
      
    
          | Published in | Knowledge-based systems Vol. 312; p. 113168 | 
|---|---|
| Main Authors | , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
            Elsevier B.V
    
        15.03.2025
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 0950-7051 | 
| DOI | 10.1016/j.knosys.2025.113168 | 
Cover
| Summary: | Data analysis presents significant challenges due to its high dimensionality, imbalanced distribution, and complexity. Traditional feature selection methods often fall short of addressing these challenges effectively. In response, this research proposes a novel hybrid methodology that integrates multi-filtering techniques with the Multi-Objective Binary Horse Herd Optimization (MOBHHO) algorithm to tackle gene selection and ensemble classification in microarray data. The study begins by identifying the limitations of existing methods, emphasizing the need for a comprehensive approach that combines the strengths of multi-filtering and metaheuristic optimization. Leveraging various filtering methods, including Information Gain, entropy, Pearson correlation, mutual information, mean absolute deviation, and weighted entropy variance, the proposed methodology aims to mitigate biases and enhance the robustness of feature selection. Subsequently, the MOBHHO wrapper method facilitates multi-objective optimization, optimizing objectives by minimizing selected features and maximizing prediction criteria. Finally, the ensemble prediction model LightGBA capitalizes on the diverse solutions obtained from MOBHHO, striking an optimal balance between feature count and prediction accuracy. The proposed method was evaluated on multiple high-dimensional microarray datasets such as Small Round Blue Cell Tumors (SRBCT), Prostate tumors, Lung cancer, Leukemia, Colon tumor and diffuse large B-cell lymphoma (DLBCL), Lymphoma, ALL-AML-4C, ALL-AML-3C, and MLL datasets are used to assess its effectiveness in feature selection and classification accuracy. The experimental outcomes demonstrate the efficacy of the proposed methodology, showcasing improved prediction accuracy and feature subset selection across diverse datasets. | 
|---|---|
| ISSN: | 0950-7051 | 
| DOI: | 10.1016/j.knosys.2025.113168 |