基于混合采样的非平衡数据集分类研究

针对传统的过采样算法在增加样本的同时可能使决策域变小和噪声点增加的问题进行了研究,提出了一种基于错分的混合采样算法。该算法是以SVM为元分类器,Ada Boost算法进行迭代,对每次错分的样本点根据其空间近邻关系,采取一种改进的混合采样策略:对噪声样本直接删除;对危险样本约除其近邻中的正类样本;对安全样本则采用SMOTE算法合成新样本并加入到新的训练集中重新训练学习。在实际数据集上进行实验,并与SMOTE-SVM和Ada Boost-SVM-OBMS算法进行比较,实验结果表明该算法能够有效地提高负类的分类准确率。...

Full description

Saved in:
Bibliographic Details
Published in计算机应用研究 Vol. 32; no. 2; pp. 379 - 381
Main Author 古平 欧阳源遊
Format Journal Article
LanguageChinese
Published 重庆大学 计算机学院,重庆,400030 2015
Subjects
Online AccessGet full text
ISSN1001-3695
DOI10.3969/j.issn.1001-3695.2015.02.014

Cover

More Information
Summary:针对传统的过采样算法在增加样本的同时可能使决策域变小和噪声点增加的问题进行了研究,提出了一种基于错分的混合采样算法。该算法是以SVM为元分类器,Ada Boost算法进行迭代,对每次错分的样本点根据其空间近邻关系,采取一种改进的混合采样策略:对噪声样本直接删除;对危险样本约除其近邻中的正类样本;对安全样本则采用SMOTE算法合成新样本并加入到新的训练集中重新训练学习。在实际数据集上进行实验,并与SMOTE-SVM和Ada Boost-SVM-OBMS算法进行比较,实验结果表明该算法能够有效地提高负类的分类准确率。
Bibliography:51-1196/TP
GU Ping , OU YANG Yuan-you ( College of Computer Science, Chongqing University, Chongqing 400030, China)
mixed-sampling; misclassified samples; unbalanced data ; AdaBoost algorithm ; SVM algorithm
To solve the problem that traditional over-sampling algorithms may cause the decision-making domain becomes smaller and the noise point increases while sample was being increased, this paper presented a mixed-sampling algorithm based on misclassified samples. This approach used support vector machine be as base classifier and the misclassified samples be identified during each iteration, according to their spatial relationship between neighbors of each misclassified samples, it took an improved mixed-sampling strategy:remove this directly to the noise samples and exclude positive class samples in the neighbors to the dangerous samples, while, to security samples, compose new samples by SMOTE algorithm, then added to the original training set to retrain the classification model. Compared with SMOTE-SVM algori
ISSN:1001-3695
DOI:10.3969/j.issn.1001-3695.2015.02.014