一种用于微博谣言检测的半监督学习算法

在微博谣言检测中,对微博谣言进行正确标注需要耗费大量的人力和时间,同时数据类别的不平衡也影响了微博谣言的正确识别。为了解决该问题,提出一种基于Co-Forest算法针对不平衡数据集的改进方法,利用SMOTE算法和分层抽样平衡数据分布,并通过代价敏感的加权投票法来提高对未标记样本预测的正确率。该方法只需要对少量训练数据实例进行谣言类别标注即可有效检测谣言。10组UCI测试数据和2组微博谣言的实证实验证明了算法有效性。...

Full description

Saved in:

Bibliographic Details
Published in	计算机应用研究 Vol. 33; no. 3; pp. 744 - 748
Main Author	路同强石冰闫中敏周珮
Format	Journal Article
Language	Chinese
Published	中国人民解放军 61516 部队，北京 100094%山东大学计算机科学与技术学院,济南,250101 2016 山东大学计算机科学与技术学院，济南 250101
Subjects	Co-Forest算法 SMOTE 不平衡数据代价敏感半监督学习微博谣言检测 Co-Forest 算法 rumor detection imbalanced data 不平衡数据微博半监督学习 cost sensitive 谣言检测 Co-Forest algorithm SMOTE 代价敏感 semi-supervised learning microblog
Online Access	Get full text
ISSN	1001-3695
DOI	10.3969/j.issn.1001-3695.2016.03.024

Cover

More Information
Summary:	在微博谣言检测中,对微博谣言进行正确标注需要耗费大量的人力和时间,同时数据类别的不平衡也影响了微博谣言的正确识别。为了解决该问题,提出一种基于Co-Forest算法针对不平衡数据集的改进方法,利用SMOTE算法和分层抽样平衡数据分布,并通过代价敏感的加权投票法来提高对未标记样本预测的正确率。该方法只需要对少量训练数据实例进行谣言类别标注即可有效检测谣言。10组UCI测试数据和2组微博谣言的实证实验证明了算法有效性。
Bibliography:	51-1196/TP microblog; rumor detection; imbalanced data; semi-supervised learning; Co-Forest algorithm; SMOTE; cost sensitive In microblog rumor detection,labeling microblog rumors correctly requires a huge amount of manpower and time.At the same time,imbalanced data category also affects the correct recognition of microblog rumors. To resolve this problem,this paper proposed an improved method based on Co-Forest algorithm,which could be used for imbalanced dataset. This method used SMOTE algorithm and stratified sampling to balance the data＇s distribution. Besides,it improved the correct rate of unlabeled sample through the cost-sensitive weighted voting method. This method required only a small amount of training data instances which labeled a rumor category,and could be used to detect rumors effectively. Experiment results on 10 UCI data sets and 2 microblog rumors prove that the algorithm is effective. Lu Tongqiang,Shi Bing,Yan Zhongmin,Zhou Pei（1. School of Computer Science ＆ Technology, Shandong University,
ISSN:	1001-3695
DOI:	10.3969/j.issn.1001-3695.2016.03.024