ACOSampling: An ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data

In DNA microarray data, class imbalance problem occurs frequently, causing poor prediction performance for minority classes. Moreover, its other features, such as high-dimension, small sample, high noise etc., intensify this damage. In this study, we propose ACOSampling that is a novel undersampling...

Full description

Saved in:
Bibliographic Details
Published inNeurocomputing (Amsterdam) Vol. 101; pp. 309 - 318
Main Authors Yu, Hualong, Ni, Jun, Zhao, Jing
Format Journal Article
LanguageEnglish
Published Amsterdam Elsevier B.V 04.02.2013
Elsevier
Subjects
Online AccessGet full text
ISSN0925-2312
1872-8286
DOI10.1016/j.neucom.2012.08.018

Cover

More Information
Summary:In DNA microarray data, class imbalance problem occurs frequently, causing poor prediction performance for minority classes. Moreover, its other features, such as high-dimension, small sample, high noise etc., intensify this damage. In this study, we propose ACOSampling that is a novel undersampling method based on the idea of ant colony optimization (ACO) to address this problem. The algorithm starts with feature selection technology to eliminate noisy genes in data. Then we randomly and repeatedly divided the original training set into two groups: training set and validation set. In each division, one modified ACO algorithm as a variant of our previous work is conducted to filter less informative majority samples and search the corresponding optimal training sample subset. At last, the statistical results from all local optimal training sample subsets are given in the form of frequence list, where each frequence indicates the importance of the corresponding majority sample. We only extracted those high frequency ones and combined them with all minority samples to construct the final balanced training set. We evaluated the method on four benchmark skewed DNA microarray datasets by support vector machine (SVM) classifier, showing that the proposed method outperforms many other sampling approaches, which indicates its superiority. ► ACO algorithm is modified for undersampling skewed DNA microarray data. ► The significance of each majority sample is estimated by ranking frequence list. ► ACOSampling increases classification performance but spends more time. ► Selecting a few feature genes helps to improve classification performance. ► Some classification tasks are harmful and the others are unharmful.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2012.08.018