Learning from Imbalanced Data in Presence of Noisy and Borderline Examples

In this paper we studied re-sampling methods for learning classifiers from imbalanced data. We carried out a series of experiments on artificial data sets to explore the impact of noisy and borderline examples from the minority class on the classifier performance. Results showed that if data was suf...

Full description

Saved in:
Bibliographic Details
Published inRough Sets and Current Trends in Computing pp. 158 - 167
Main Authors Napierała, Krystyna, Stefanowski, Jerzy, Wilk, Szymon
Format Book Chapter
LanguageEnglish
Published Berlin, Heidelberg Springer Berlin Heidelberg 2010
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783642135286
3642135285
ISSN0302-9743
1611-3349
DOI10.1007/978-3-642-13529-3_18

Cover

More Information
Summary:In this paper we studied re-sampling methods for learning classifiers from imbalanced data. We carried out a series of experiments on artificial data sets to explore the impact of noisy and borderline examples from the minority class on the classifier performance. Results showed that if data was sufficiently disturbed by these factors, then the focused re-sampling methods – NCR and our SPIDER2 – strongly outperformed the oversampling methods. They were also better for real-life data, where PCA visualizations suggested possible existence of noisy examples and large overlapping ares between classes.
ISBN:9783642135286
3642135285
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-642-13529-3_18