Automatic filtering algorithm for imbalanced classification

The imbalanced data set has been reported to hinder the classification performance of many machine learning algorithms on both accuracy and speed. But extremely imbalanced data sets (3~5% positive samples) are common for many applications, such as multimedia semantic classification. In this paper, w...

Full description

Saved in:

Bibliographic Details
Published in	2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery Vol. 4; pp. 1853 - 1857
Main Authors	Wei Gong, Youjie Zhou, Hangzai Luo, Jianping Fan, Aoying Zhou
Format	Conference Proceeding
Language	English
Published	IEEE 01.08.2010
Subjects	Accuracy Algorithm design and analysis Feature extraction Machine learning algorithms Support vector machines Training Training data
Online Access	Get full text
ISBN	1424459311 9781424459315
DOI	10.1109/FSKD.2010.5569437

Cover

More Information
Summary:	The imbalanced data set has been reported to hinder the classification performance of many machine learning algorithms on both accuracy and speed. But extremely imbalanced data sets (3~5% positive samples) are common for many applications, such as multimedia semantic classification. In this paper, we propose a novel algorithm to automatically remove samples that have no or negative effects on classifier training for imbalanced training data sets. By using our algorithm, most easy-to-classify dominant-class samples in imbalanced training set will be eliminated automatically. As a result, the ratio of minority class samples is increased significantly, making it more suitable for classification algorithms. Experiments show that our algorithm can keep the classification accuracy of SVM, and decrease the training time dramatically.
ISBN:	1424459311 9781424459315
DOI:	10.1109/FSKD.2010.5569437