A two-phase gene selection method using anomaly detection and genetic algorithm for microarray data

Cancer diagnosis based on gene analysis is one of the main research areas in bioinformatics and machine learning. Microarray is a technology that can simultaneously study the expression level of thousands of genes in a sample. However, mutation or change in gene expression of only a small number of...

Full description

Saved in:
Bibliographic Details
Published inKnowledge-based systems Vol. 262; p. 110249
Main Authors Akhavan, Motahare, Hasheminejad, Seyed Mohammad Hossein
Format Journal Article
LanguageEnglish
Published Elsevier B.V 28.02.2023
Subjects
Online AccessGet full text
ISSN0950-7051
1872-7409
DOI10.1016/j.knosys.2022.110249

Cover

More Information
Summary:Cancer diagnosis based on gene analysis is one of the main research areas in bioinformatics and machine learning. Microarray is a technology that can simultaneously study the expression level of thousands of genes in a sample. However, mutation or change in gene expression of only a small number of genes can lead to cancer, and basically, the expression level of most genes is the same between cancerous and healthy samples. On the other hand, the main challenge in microarray data is the high number of genes compared to the very small number of samples. This issue makes gene selection an essential step in microarray analysis. In this paper, we have proposed a new two-phase gene selection method for microarray data. In the first stage of this method, with a different approach, the genes that are the main features of the microarray are considered as training samples instead of cancerous and healthy samples; afterward, we reduce the number of genes to a great extent via anomaly detection. In the second stage, we apply a guided genetic algorithm to the genes obtained from the previous step to reach the final effective genes. Based on the experimental results, our method can reduce the number of genes up to at least 99% on all datasets. Besides, in addition to the very high reduction rate of genes, we managed to significantly increase the classification accuracy using the selected genes.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2022.110249