A class imbalance-aware Relief algorithm for the classification of tumors using microarray gene expression data

DNA microarray data has been widely used in cancer research due to the significant advantage helped to successfully distinguish between tumor classes. However, typical gene expression data usually presents a high-dimensional imbalanced characteristic, which poses severe challenge for traditional mac...

Full description

Saved in:
Bibliographic Details
Published inComputational biology and chemistry Vol. 80; pp. 121 - 127
Main Authors He, Yuanyu, Zhou, Junhai, Lin, Yaping, Zhu, Tuanfei
Format Journal Article
LanguageEnglish
Published England Elsevier Ltd 01.06.2019
Subjects
Online AccessGet full text
ISSN1476-9271
1476-928X
1476-928X
DOI10.1016/j.compbiolchem.2019.03.017

Cover

More Information
Summary:DNA microarray data has been widely used in cancer research due to the significant advantage helped to successfully distinguish between tumor classes. However, typical gene expression data usually presents a high-dimensional imbalanced characteristic, which poses severe challenge for traditional machine learning methods to construct a robust classifier performing well on both the minority and majority classes. As one of the most successful feature weighting techniques, Relief is considered to particularly suit to handle high-dimensional problems. Unfortunately, almost all relief-based methods have not taken the class imbalance distribution into account. This study identifies that existing Relief-based algorithms may underestimate the features with the discernibility ability of minority classes, and ignore the distribution characteristic of minority class samples. As a result, an additional bias towards being classified into the majority classes can be introduced. To this end, a new method, named imRelief, is proposed for efficiently handling high-dimensional imbalanced gene expression data. imRelief can correct the bias towards to the majority classes, and consider the scattered distributional characteristic of minority class samples in the process of estimating feature weights. This way, imRelief has the ability to reward the features which perform well at separating the minority classes from other classes. Experiments on four microarray gene expression data sets demonstrate the effectiveness of imRelief in both feature weighting and feature subset selection applications.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1476-9271
1476-928X
1476-928X
DOI:10.1016/j.compbiolchem.2019.03.017