A two-phase gene selection method using anomaly detection and genetic algorithm for microarray data

Cancer diagnosis based on gene analysis is one of the main research areas in bioinformatics and machine learning. Microarray is a technology that can simultaneously study the expression level of thousands of genes in a sample. However, mutation or change in gene expression of only a small number of...

Full description

Saved in:

Bibliographic Details
Published in	Knowledge-based systems Vol. 262; p. 110249
Main Authors	Akhavan, Motahare, Hasheminejad, Seyed Mohammad Hossein
Format	Journal Article
Language	English
Published	Elsevier B.V 28.02.2023
Subjects	Anomaly detection Feature selection, Microarray Gene selection Genetic algorithm Gene selection Genetic algorithm Feature selection, Microarray Anomaly detection
Online Access	Get full text
ISSN	0950-7051 1872-7409
DOI	10.1016/j.knosys.2022.110249

Cover

More Information
Summary:	Cancer diagnosis based on gene analysis is one of the main research areas in bioinformatics and machine learning. Microarray is a technology that can simultaneously study the expression level of thousands of genes in a sample. However, mutation or change in gene expression of only a small number of genes can lead to cancer, and basically, the expression level of most genes is the same between cancerous and healthy samples. On the other hand, the main challenge in microarray data is the high number of genes compared to the very small number of samples. This issue makes gene selection an essential step in microarray analysis. In this paper, we have proposed a new two-phase gene selection method for microarray data. In the first stage of this method, with a different approach, the genes that are the main features of the microarray are considered as training samples instead of cancerous and healthy samples; afterward, we reduce the number of genes to a great extent via anomaly detection. In the second stage, we apply a guided genetic algorithm to the genes obtained from the previous step to reach the final effective genes. Based on the experimental results, our method can reduce the number of genes up to at least 99% on all datasets. Besides, in addition to the very high reduction rate of genes, we managed to significantly increase the classification accuracy using the selected genes.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2022.110249