Weighted Combination of Łukasiewicz implication and Fuzzy Jaccard similarity in Hybrid Ensemble Framework (WCLFJHEF) for Gene Selection

A framework is developed for gene expression analysis by introducing fuzzy Jaccard similarity (FJS) and combining Łukasiewicz implication with it through weights in hybrid ensemble framework (WCLFJHEF) for gene selection in cancer. The method is called weighted combination of Łukasiewicz implication...

Full description

Saved in:
Bibliographic Details
Published inComputers in biology and medicine Vol. 170; p. 107981
Main Authors Roy, Sukriti, Singh, Joginder, Ray, Shubhra Sankar
Format Journal Article
LanguageEnglish
Published United States Elsevier Ltd 01.03.2024
Elsevier Limited
Subjects
Online AccessGet full text
ISSN0010-4825
1879-0534
1879-0534
DOI10.1016/j.compbiomed.2024.107981

Cover

More Information
Summary:A framework is developed for gene expression analysis by introducing fuzzy Jaccard similarity (FJS) and combining Łukasiewicz implication with it through weights in hybrid ensemble framework (WCLFJHEF) for gene selection in cancer. The method is called weighted combination of Łukasiewicz implication and fuzzy Jaccard similarity in hybrid ensemble framework (WCLFJHEF). While the fuzziness in Jaccard similarity is incorporated by using the existing Gödel fuzzy logic, the weights are obtained by maximizing the average F-score of selected genes in classifying the cancer patients. The patients are first divided into different clusters, based on the number of patient groups, using average linkage agglomerative clustering and a new score, called WCLFJ (weighted combination of Łukasiewicz implication and fuzzy Jaccard similarity). The genes are then selected from each cluster separately using filter based Relief-F and wrapper based SVMRFE (Support Vector Machine with Recursive Feature Elimination). A gene (feature) pool is created by considering the union of selected features for all the clusters. A set of informative genes is selected from the pool using sequential backward floating search (SBFS) algorithm. Patients are then classified using Naïve Bayes’(NB) and Support Vector Machine (SVM) separately, using the selected genes and the related F-scores are calculated. The weights in WCLFJ are then updated iteratively to maximize the average F-score obtained from the results of the classifier. The effectiveness of WCLFJHEF is demonstrated on six gene expression datasets. The average values of accuracy, F-score, recall, precision and MCC over all the datasets, are 95%, 94%, 94%, 94%, and 90%, respectively. The explainability of the selected genes is shown using SHapley Additive exPlanations (SHAP) values and this information is further used to rank them. The relevance of the selected gene set are biologically validated using the KEGG Pathway, Gene Ontology (GO), and existing literatures. It is seen that the genes that are selected by WCLFJHEF are candidates for genomic alterations in the various cancer types. The source code of WCLFJHEF is available at http://www.isical.ac.in/~shubhra/WCLFJHEF.html. [Display omitted] •Selection of informative genes that can help to classify normal and cancer patients.•Introducing Gödel fuzzy logic in Jaccard similarity.•Combining Łukasiewicz implication with fuzzy Jaccard using weights.•Incorporating patient similarity based clustering in ensemble learning framework.•Applying parallel hybrid approach (SVMRFE and Relief-F) in ensemble framework.•Biologically validating selected genes using KEGG, literature, and Gene Ontology.•Ranking selected genes using SHAP value based explainable model.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0010-4825
1879-0534
1879-0534
DOI:10.1016/j.compbiomed.2024.107981