A two-phase gene selection method using anomaly detection and genetic algorithm for microarray data
Cancer diagnosis based on gene analysis is one of the main research areas in bioinformatics and machine learning. Microarray is a technology that can simultaneously study the expression level of thousands of genes in a sample. However, mutation or change in gene expression of only a small number of...
Saved in:
| Published in | Knowledge-based systems Vol. 262; p. 110249 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
Elsevier B.V
28.02.2023
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0950-7051 1872-7409 |
| DOI | 10.1016/j.knosys.2022.110249 |
Cover
| Summary: | Cancer diagnosis based on gene analysis is one of the main research areas in bioinformatics and machine learning. Microarray is a technology that can simultaneously study the expression level of thousands of genes in a sample. However, mutation or change in gene expression of only a small number of genes can lead to cancer, and basically, the expression level of most genes is the same between cancerous and healthy samples. On the other hand, the main challenge in microarray data is the high number of genes compared to the very small number of samples. This issue makes gene selection an essential step in microarray analysis. In this paper, we have proposed a new two-phase gene selection method for microarray data. In the first stage of this method, with a different approach, the genes that are the main features of the microarray are considered as training samples instead of cancerous and healthy samples; afterward, we reduce the number of genes to a great extent via anomaly detection. In the second stage, we apply a guided genetic algorithm to the genes obtained from the previous step to reach the final effective genes. Based on the experimental results, our method can reduce the number of genes up to at least 99% on all datasets. Besides, in addition to the very high reduction rate of genes, we managed to significantly increase the classification accuracy using the selected genes. |
|---|---|
| ISSN: | 0950-7051 1872-7409 |
| DOI: | 10.1016/j.knosys.2022.110249 |