一种用于微博谣言检测的半监督学习算法

在微博谣言检测中,对微博谣言进行正确标注需要耗费大量的人力和时间,同时数据类别的不平衡也影响了微博谣言的正确识别。为了解决该问题,提出一种基于Co-Forest算法针对不平衡数据集的改进方法,利用SMOTE算法和分层抽样平衡数据分布,并通过代价敏感的加权投票法来提高对未标记样本预测的正确率。该方法只需要对少量训练数据实例进行谣言类别标注即可有效检测谣言。10组UCI测试数据和2组微博谣言的实证实验证明了算法有效性。...

Full description

Saved in:
Bibliographic Details
Published in计算机应用研究 Vol. 33; no. 3; pp. 744 - 748
Main Author 路同强 石冰 闫中敏 周珮
Format Journal Article
LanguageChinese
Published 中国人民解放军 61516 部队,北京 100094%山东大学 计算机科学与技术学院,济南,250101 2016
山东大学 计算机科学与技术学院,济南 250101
Subjects
Online AccessGet full text
ISSN1001-3695
DOI10.3969/j.issn.1001-3695.2016.03.024

Cover

Abstract 在微博谣言检测中,对微博谣言进行正确标注需要耗费大量的人力和时间,同时数据类别的不平衡也影响了微博谣言的正确识别。为了解决该问题,提出一种基于Co-Forest算法针对不平衡数据集的改进方法,利用SMOTE算法和分层抽样平衡数据分布,并通过代价敏感的加权投票法来提高对未标记样本预测的正确率。该方法只需要对少量训练数据实例进行谣言类别标注即可有效检测谣言。10组UCI测试数据和2组微博谣言的实证实验证明了算法有效性。
AbstractList TP181%TP301.6; 在微博谣言检测中,对微博谣言进行正确标注需要耗费大量的人力和时间,同时数据类别的不平衡也影响了微博谣言的正确识别。为了解决该问题,提出一种基于 Co-Forest 算法针对不平衡数据集的改进方法,利用 SMOTE 算法和分层抽样平衡数据分布,并通过代价敏感的加权投票法来提高对未标记样本预测的正确率。该方法只需要对少量训练数据实例进行谣言类别标注即可有效检测谣言。10组 UCI 测试数据和2组微博谣言的实证实验证明了算法有效性。
在微博谣言检测中,对微博谣言进行正确标注需要耗费大量的人力和时间,同时数据类别的不平衡也影响了微博谣言的正确识别。为了解决该问题,提出一种基于Co-Forest算法针对不平衡数据集的改进方法,利用SMOTE算法和分层抽样平衡数据分布,并通过代价敏感的加权投票法来提高对未标记样本预测的正确率。该方法只需要对少量训练数据实例进行谣言类别标注即可有效检测谣言。10组UCI测试数据和2组微博谣言的实证实验证明了算法有效性。
Abstract_FL In microblog rumor detection,labeling microblog rumors correctly requires a huge amount of manpower and time. At the same time,imbalanced data category also affects the correct recognition of microblog rumors.To resolve this problem, this paper proposed an improved method based on Co-Forest algorithm,which could be used for imbalanced dataset.This method used SMOTE algorithm and stratified sampling to balance the data’s distribution.Besides,it improved the correct rate of unlabeled sample through the cost-sensitive weighted voting method.This method required only a small amount of training data instances which labeled a rumor category,and could be used to detect rumors effectively.Experiment results on 10 UCI data sets and 2 microblog rumors prove that the algorithm is effective.
Author 路同强 石冰 闫中敏 周珮
AuthorAffiliation 山东大学计算机科学与技术学院,济南250101 中国人民解放军61516部队,北京100094
AuthorAffiliation_xml – name: 山东大学 计算机科学与技术学院,济南 250101; 中国人民解放军 61516 部队,北京 100094%山东大学 计算机科学与技术学院,济南,250101
Author_FL Yan Zhongmin
Zhou Pei
Lu Tongqiang
Shi Bing
Author_FL_xml – sequence: 1
  fullname: Lu Tongqiang
– sequence: 2
  fullname: Shi Bing
– sequence: 3
  fullname: Yan Zhongmin
– sequence: 4
  fullname: Zhou Pei
Author_xml – sequence: 1
  fullname: 路同强 石冰 闫中敏 周珮
BookMark eNo9jz9Lw0Achm-oYFv9EuLgkni_3OWSG6X4Dwou3cOll6sNetUEkWwZnJQIBUWoCEJAu1gEEdTBT9M0_RhGKk4vvDy8L08D1fRABwitAzYJZ3wzNPtxrE3AGAzCuG1aGJiJiYktWkP1_34ZNeI4xJhawHEdselHWj5n5c14-nVdfE-KbDR_zefjdJans_ercnRRZJfl_bB8yIuXp-nnYzm5m73drqAlJY7iYPUvm6izs91p7Rntg9391lbb6DJMDRJwKkFKVwmpXOk6VFLpggKLKF8Q7gjlYyBAXDuwqLSZBD-QwG2gXIKjSBNtLGbPhVZC97xwcBbp6tAL4zBJkvBXEpNKsULXFmj3cKB7p_0KPon6xyJKPMYcbrvMsskP4yBqNA
ClassificationCodes TP181%TP301.6
ContentType Journal Article
Copyright Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
Copyright_xml – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
DBID 2RA
92L
CQIGP
W92
~WA
2B.
4A8
92I
93N
PSX
TCJ
DOI 10.3969/j.issn.1001-3695.2016.03.024
DatabaseName 维普_期刊
中文科技期刊数据库-CALIS站点
中文科技期刊数据库-7.0平台
中文科技期刊数据库-工程技术
中文科技期刊数据库- 镜像站点
Wanfang Data Journals - Hong Kong
WANFANG Data Centre
Wanfang Data Journals
万方数据期刊 - 香港版
China Online Journals (COJ)
China Online Journals (COJ)
DatabaseTitleList

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
DocumentTitleAlternate Semi-supervised learning algorithm applied to microblog rumors detection
DocumentTitle_FL Semi-supervised learning algorithm applied to microblog rumors detection
EndPage 748
ExternalDocumentID jsjyyyj201603024
667958625
GrantInformation_xml – fundername: 国家自然科学基金资助项目
  funderid: (61303005)
GroupedDBID -0Y
2B.
2C0
2RA
5XA
5XJ
92H
92I
92L
ACGFS
ALMA_UNASSIGNED_HOLDINGS
CCEZO
CQIGP
CUBFJ
CW9
TCJ
TGT
U1G
U5S
W92
~WA
4A8
93N
ABJNI
PSX
ID FETCH-LOGICAL-c604-3e94d1dd8fadf8d874d4d81f123fba397afb0131385e24d56d1bed195149d17f3
ISSN 1001-3695
IngestDate Thu May 29 03:54:51 EDT 2025
Wed Feb 14 10:24:40 EST 2024
IsPeerReviewed false
IsScholarly true
Issue 3
Keywords Co-Forest 算法
rumor detection
imbalanced data
不平衡数据
微博
半监督学习
cost sensitive
谣言检测
Co-Forest algorithm
SMOTE
代价敏感
semi-supervised learning
microblog
Language Chinese
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c604-3e94d1dd8fadf8d874d4d81f123fba397afb0131385e24d56d1bed195149d17f3
Notes 51-1196/TP
microblog; rumor detection; imbalanced data; semi-supervised learning; Co-Forest algorithm; SMOTE; cost sensitive
In microblog rumor detection,labeling microblog rumors correctly requires a huge amount of manpower and time.At the same time,imbalanced data category also affects the correct recognition of microblog rumors. To resolve this problem,this paper proposed an improved method based on Co-Forest algorithm,which could be used for imbalanced dataset. This method used SMOTE algorithm and stratified sampling to balance the data's distribution. Besides,it improved the correct rate of unlabeled sample through the cost-sensitive weighted voting method. This method required only a small amount of training data instances which labeled a rumor category,and could be used to detect rumors effectively. Experiment results on 10 UCI data sets and 2 microblog rumors prove that the algorithm is effective.
Lu Tongqiang,Shi Bing,Yan Zhongmin,Zhou Pei(1. School of Computer Science & Technology, Shandong University,
PageCount 5
ParticipantIDs wanfang_journals_jsjyyyj201603024
chongqing_primary_667958625
PublicationCentury 2000
PublicationDate 2016
PublicationDateYYYYMMDD 2016-01-01
PublicationDate_xml – year: 2016
  text: 2016
PublicationDecade 2010
PublicationTitle 计算机应用研究
PublicationTitleAlternate Application Research of Computers
PublicationYear 2016
Publisher 中国人民解放军 61516 部队,北京 100094%山东大学 计算机科学与技术学院,济南,250101
山东大学 计算机科学与技术学院,济南 250101
Publisher_xml – name: 山东大学 计算机科学与技术学院,济南 250101
– name: 中国人民解放军 61516 部队,北京 100094%山东大学 计算机科学与技术学院,济南,250101
SSID ssj0042190
ssib001102940
ssib002263599
ssib023646305
ssib051375744
ssib025702191
Score 2.0475194
Snippet 在微博谣言检测中,对微博谣言进行正确标注需要耗费大量的人力和时间,同时数据类别的不平衡也影响了微博谣言的正确识别。为了解决该问题,提出一种基于Co-Forest算法针对不平...
TP181%TP301.6; 在微博谣言检测中,对微博谣言进行正确标注需要耗费大量的人力和时间,同时数据类别的不平衡也影响了微博谣言的正确识别。为了解决该问题,提出一种基于...
SourceID wanfang
chongqing
SourceType Aggregation Database
Publisher
StartPage 744
SubjectTerms Co-Forest算法
SMOTE
不平衡数据
代价敏感
半监督学习
微博
谣言检测
Title 一种用于微博谣言检测的半监督学习算法
URI http://lib.cqvip.com/qk/93231X/201603/667958625.html
https://d.wanfangdata.com.cn/periodical/jsjyyyj201603024
Volume 33
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVEBS
  databaseName: EBSCOhost Academic Search Ultimate
  issn: 1001-3695
  databaseCode: ABDBF
  dateStart: 20130901
  customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn
  omitProxy: true
  ssIdentifier: ssib025702191
  providerName: EBSCOhost
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NaxQxFA-1BfHit1irUqE5la2bmXweM-0sRdBThd6WmclMSw9bte2hPfXgSakgKEJFEBa0F4sggnrwH_Df6Hb7Z_jy0dlVpKiX4W2S995v82aTl2zyHkJTphAZZ0o2TFZWDcqrqpEXgjeMkRHJGc-5y95w9x6fv0_vLLLFkZEfQ6eWNtbzmWLrj_dK_seqUAZ2tbdk_8GytVAoABrsC0-wMDz_ysY4pTiR9rBCKrAWWM5ZQlGspavSWKY4ZThJsXYENFAapxInTaxjS2jPzu1HTyTQLHFygJ0GLqldSYIVccScY2dYA8GdLoW1h5FiJZycGPvElse-r1MHSMgvzdSsxWlBaot8gF9Ygb5EK5zUO4gOPpS1LJNqYjnruK2YacfesqotbMDQhCJlNekkdBYgtlqhujXtJJCgTbYA1PAmiL-dGUZseyYs5uELhSHdx9YIr248ND4LH2wyTPXCB_n8fRaJFVduFrEKZmoF9hwgdxFxIzqYPeszjZwLxWB9yE6hsUiA2zOKxnQyl7QG3ik4c8PRCiMbCGiwGrSh_PnQ8GvzC8J8Ug-_jMSCuWQF3tGgUOmDbQSAp9FUQH_7JOw2isjyamfpIfhG7qpap8o6S0Ne1cJ5dDYshya1f7cvoJGt5Yvo3HGqkckw81xC_ODLdv_9Tv_F3sG3Z73v-72d3aOP3aO97cPu9uHnp_3dx72dJ_3Xz_tvur0P7w6-vu3vvzr89PIyWmilC7PzjZDyo1Fw-xddqaghME5UmamkkYIaaiSpwL2q8gxc56yy-_YklqyMqGHckLw0BFYJVBkiqvgKGu2sdsqraJJkJI6ZiIq4lDQSmYxp2cwKWJ-X4CEXbBxN1L3QfuAju7RrG46jW6Ff2uH3vtZeWVvZ3NxciVxmdujHaydKmEBnbEu_W3cdja4_2ihvgP-6nt8M78VPTIF2pg
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E4%B8%80%E7%A7%8D%E7%94%A8%E4%BA%8E%E5%BE%AE%E5%8D%9A%E8%B0%A3%E8%A8%80%E6%A3%80%E6%B5%8B%E7%9A%84%E5%8D%8A%E7%9B%91%E7%9D%A3%E5%AD%A6%E4%B9%A0%E7%AE%97%E6%B3%95&rft.jtitle=%E8%AE%A1%E7%AE%97%E6%9C%BA%E5%BA%94%E7%94%A8%E7%A0%94%E7%A9%B6&rft.au=%E8%B7%AF%E5%90%8C%E5%BC%BA+%E7%9F%B3%E5%86%B0+%E9%97%AB%E4%B8%AD%E6%95%8F+%E5%91%A8%E7%8F%AE&rft.date=2016&rft.issn=1001-3695&rft.volume=33&rft.issue=3&rft.spage=744&rft.epage=748&rft_id=info:doi/10.3969%2Fj.issn.1001-3695.2016.03.024&rft.externalDocID=667958625
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fimage.cqvip.com%2Fvip1000%2Fqk%2F93231X%2F93231X.jpg
http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fjsjyyyj%2Fjsjyyyj.jpg