基于特征选择的过抽样算法的研究

为了提高不平衡数据集分类中少数类的分类精度，提出了基于特征选择的过抽样算法。该算法考虑了不同的特征列对分类性能的不同作用，首先对训练集进行特征选择，选出一组特征列，然后根据选出的特征列合成少数类样本，合成的每个少数类样本的特征由两部分组成，一部分是特征选择的特征列对应的特征，另一部分是按照SMOTE原理合成的特征。将基于特征选择的过抽样算法和SMOTE算法进行实验比较，结果表明基于特征选择的过抽样算法的性能优于SMOTE算法．能有效降低数据的不平衡性，提高少数类的分类精度。...

Full description

Saved in:

Bibliographic Details
Published in	电信科学 Vol. 28; no. 1; pp. 87 - 91
Main Author	陆慧娟张金伟马小平杨小兵
Format	Journal Article
Language	Chinese
Published	中国通信学会 2012 人民邮电出版社有限公司中国计量学院信息工程学院杭州310018%中国计量学院信息工程学院杭州310018%中国矿业大学信息与电气工程学院徐州221008 中国矿业大学信息与电气工程学院徐州221008
Subjects	不平衡数据集特征选择过抽样遗传算法不平衡数据集特征选择过抽样遗传算法
Online Access	Get full text
ISSN	1000-0801
DOI	10.3969/j.issn.1000-0801.2012.01.017

Cover

More Information
Summary:	为了提高不平衡数据集分类中少数类的分类精度，提出了基于特征选择的过抽样算法。该算法考虑了不同的特征列对分类性能的不同作用，首先对训练集进行特征选择，选出一组特征列，然后根据选出的特征列合成少数类样本，合成的每个少数类样本的特征由两部分组成，一部分是特征选择的特征列对应的特征，另一部分是按照SMOTE原理合成的特征。将基于特征选择的过抽样算法和SMOTE算法进行实验比较，结果表明基于特征选择的过抽样算法的性能优于SMOTE算法．能有效降低数据的不平衡性，提高少数类的分类精度。
Bibliography:	To significantly improve the classification performance of the minority class, we present an over-sampling method based on feature selection. Firstly, feature selection is performed on the training data set in order to select a set of key colmnns. Then minority class samples are produced using selected key columns, and each sample consists of two kinds of features. One type of features is characteristic value that is corresponding to the selected key columns, the others is generated according to the principle of SMOTE. Comparing to SMOTE algorithm, results show that the new method performs better than SMOTE, and it can effectively reduce the imbalance of data and improve the classification accuracy of the minority class. 11-2103/TN Lu Huijuan, Zhang Jinwei, Ma Xiaoping, Yang Xiaobing （1. School of Information and Electrical Engineering, China University of Mining ＆ Technology, Xuzhou 221008, China; 2. College of Information Engineering, China Jiliang University, Hangzhou 310018, China） imbalanced data set, featu
ISSN:	1000-0801
DOI:	10.3969/j.issn.1000-0801.2012.01.017