一种用于微博谣言检测的半监督学习算法
在微博谣言检测中,对微博谣言进行正确标注需要耗费大量的人力和时间,同时数据类别的不平衡也影响了微博谣言的正确识别。为了解决该问题,提出一种基于Co-Forest算法针对不平衡数据集的改进方法,利用SMOTE算法和分层抽样平衡数据分布,并通过代价敏感的加权投票法来提高对未标记样本预测的正确率。该方法只需要对少量训练数据实例进行谣言类别标注即可有效检测谣言。10组UCI测试数据和2组微博谣言的实证实验证明了算法有效性。...
Saved in:
Published in | 计算机应用研究 Vol. 33; no. 3; pp. 744 - 748 |
---|---|
Main Author | |
Format | Journal Article |
Language | Chinese |
Published |
中国人民解放军 61516 部队,北京 100094%山东大学 计算机科学与技术学院,济南,250101
2016
山东大学 计算机科学与技术学院,济南 250101 |
Subjects | |
Online Access | Get full text |
ISSN | 1001-3695 |
DOI | 10.3969/j.issn.1001-3695.2016.03.024 |
Cover
Abstract | 在微博谣言检测中,对微博谣言进行正确标注需要耗费大量的人力和时间,同时数据类别的不平衡也影响了微博谣言的正确识别。为了解决该问题,提出一种基于Co-Forest算法针对不平衡数据集的改进方法,利用SMOTE算法和分层抽样平衡数据分布,并通过代价敏感的加权投票法来提高对未标记样本预测的正确率。该方法只需要对少量训练数据实例进行谣言类别标注即可有效检测谣言。10组UCI测试数据和2组微博谣言的实证实验证明了算法有效性。 |
---|---|
AbstractList | TP181%TP301.6; 在微博谣言检测中,对微博谣言进行正确标注需要耗费大量的人力和时间,同时数据类别的不平衡也影响了微博谣言的正确识别。为了解决该问题,提出一种基于 Co-Forest 算法针对不平衡数据集的改进方法,利用 SMOTE 算法和分层抽样平衡数据分布,并通过代价敏感的加权投票法来提高对未标记样本预测的正确率。该方法只需要对少量训练数据实例进行谣言类别标注即可有效检测谣言。10组 UCI 测试数据和2组微博谣言的实证实验证明了算法有效性。 在微博谣言检测中,对微博谣言进行正确标注需要耗费大量的人力和时间,同时数据类别的不平衡也影响了微博谣言的正确识别。为了解决该问题,提出一种基于Co-Forest算法针对不平衡数据集的改进方法,利用SMOTE算法和分层抽样平衡数据分布,并通过代价敏感的加权投票法来提高对未标记样本预测的正确率。该方法只需要对少量训练数据实例进行谣言类别标注即可有效检测谣言。10组UCI测试数据和2组微博谣言的实证实验证明了算法有效性。 |
Abstract_FL | In microblog rumor detection,labeling microblog rumors correctly requires a huge amount of manpower and time. At the same time,imbalanced data category also affects the correct recognition of microblog rumors.To resolve this problem, this paper proposed an improved method based on Co-Forest algorithm,which could be used for imbalanced dataset.This method used SMOTE algorithm and stratified sampling to balance the data’s distribution.Besides,it improved the correct rate of unlabeled sample through the cost-sensitive weighted voting method.This method required only a small amount of training data instances which labeled a rumor category,and could be used to detect rumors effectively.Experiment results on 10 UCI data sets and 2 microblog rumors prove that the algorithm is effective. |
Author | 路同强 石冰 闫中敏 周珮 |
AuthorAffiliation | 山东大学计算机科学与技术学院,济南250101 中国人民解放军61516部队,北京100094 |
AuthorAffiliation_xml | – name: 山东大学 计算机科学与技术学院,济南 250101; 中国人民解放军 61516 部队,北京 100094%山东大学 计算机科学与技术学院,济南,250101 |
Author_FL | Yan Zhongmin Zhou Pei Lu Tongqiang Shi Bing |
Author_FL_xml | – sequence: 1 fullname: Lu Tongqiang – sequence: 2 fullname: Shi Bing – sequence: 3 fullname: Yan Zhongmin – sequence: 4 fullname: Zhou Pei |
Author_xml | – sequence: 1 fullname: 路同强 石冰 闫中敏 周珮 |
BookMark | eNo9jz9Lw0Achm-oYFv9EuLgkni_3OWSG6X4Dwou3cOll6sNetUEkWwZnJQIBUWoCEJAu1gEEdTBT9M0_RhGKk4vvDy8L08D1fRABwitAzYJZ3wzNPtxrE3AGAzCuG1aGJiJiYktWkP1_34ZNeI4xJhawHEdselHWj5n5c14-nVdfE-KbDR_zefjdJans_ercnRRZJfl_bB8yIuXp-nnYzm5m73drqAlJY7iYPUvm6izs91p7Rntg9391lbb6DJMDRJwKkFKVwmpXOk6VFLpggKLKF8Q7gjlYyBAXDuwqLSZBD-QwG2gXIKjSBNtLGbPhVZC97xwcBbp6tAL4zBJkvBXEpNKsULXFmj3cKB7p_0KPon6xyJKPMYcbrvMsskP4yBqNA |
ClassificationCodes | TP181%TP301.6 |
ContentType | Journal Article |
Copyright | Copyright © Wanfang Data Co. Ltd. All Rights Reserved. |
Copyright_xml | – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved. |
DBID | 2RA 92L CQIGP W92 ~WA 2B. 4A8 92I 93N PSX TCJ |
DOI | 10.3969/j.issn.1001-3695.2016.03.024 |
DatabaseName | 维普_期刊 中文科技期刊数据库-CALIS站点 中文科技期刊数据库-7.0平台 中文科技期刊数据库-工程技术 中文科技期刊数据库- 镜像站点 Wanfang Data Journals - Hong Kong WANFANG Data Centre Wanfang Data Journals 万方数据期刊 - 香港版 China Online Journals (COJ) China Online Journals (COJ) |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
DocumentTitleAlternate | Semi-supervised learning algorithm applied to microblog rumors detection |
DocumentTitle_FL | Semi-supervised learning algorithm applied to microblog rumors detection |
EndPage | 748 |
ExternalDocumentID | jsjyyyj201603024 667958625 |
GrantInformation_xml | – fundername: 国家自然科学基金资助项目 funderid: (61303005) |
GroupedDBID | -0Y 2B. 2C0 2RA 5XA 5XJ 92H 92I 92L ACGFS ALMA_UNASSIGNED_HOLDINGS CCEZO CQIGP CUBFJ CW9 TCJ TGT U1G U5S W92 ~WA 4A8 93N ABJNI PSX |
ID | FETCH-LOGICAL-c604-3e94d1dd8fadf8d874d4d81f123fba397afb0131385e24d56d1bed195149d17f3 |
ISSN | 1001-3695 |
IngestDate | Thu May 29 03:54:51 EDT 2025 Wed Feb 14 10:24:40 EST 2024 |
IsPeerReviewed | false |
IsScholarly | true |
Issue | 3 |
Keywords | Co-Forest 算法 rumor detection imbalanced data 不平衡数据 微博 半监督学习 cost sensitive 谣言检测 Co-Forest algorithm SMOTE 代价敏感 semi-supervised learning microblog |
Language | Chinese |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c604-3e94d1dd8fadf8d874d4d81f123fba397afb0131385e24d56d1bed195149d17f3 |
Notes | 51-1196/TP microblog; rumor detection; imbalanced data; semi-supervised learning; Co-Forest algorithm; SMOTE; cost sensitive In microblog rumor detection,labeling microblog rumors correctly requires a huge amount of manpower and time.At the same time,imbalanced data category also affects the correct recognition of microblog rumors. To resolve this problem,this paper proposed an improved method based on Co-Forest algorithm,which could be used for imbalanced dataset. This method used SMOTE algorithm and stratified sampling to balance the data's distribution. Besides,it improved the correct rate of unlabeled sample through the cost-sensitive weighted voting method. This method required only a small amount of training data instances which labeled a rumor category,and could be used to detect rumors effectively. Experiment results on 10 UCI data sets and 2 microblog rumors prove that the algorithm is effective. Lu Tongqiang,Shi Bing,Yan Zhongmin,Zhou Pei(1. School of Computer Science & Technology, Shandong University, |
PageCount | 5 |
ParticipantIDs | wanfang_journals_jsjyyyj201603024 chongqing_primary_667958625 |
PublicationCentury | 2000 |
PublicationDate | 2016 |
PublicationDateYYYYMMDD | 2016-01-01 |
PublicationDate_xml | – year: 2016 text: 2016 |
PublicationDecade | 2010 |
PublicationTitle | 计算机应用研究 |
PublicationTitleAlternate | Application Research of Computers |
PublicationYear | 2016 |
Publisher | 中国人民解放军 61516 部队,北京 100094%山东大学 计算机科学与技术学院,济南,250101 山东大学 计算机科学与技术学院,济南 250101 |
Publisher_xml | – name: 山东大学 计算机科学与技术学院,济南 250101 – name: 中国人民解放军 61516 部队,北京 100094%山东大学 计算机科学与技术学院,济南,250101 |
SSID | ssj0042190 ssib001102940 ssib002263599 ssib023646305 ssib051375744 ssib025702191 |
Score | 2.0475194 |
Snippet | 在微博谣言检测中,对微博谣言进行正确标注需要耗费大量的人力和时间,同时数据类别的不平衡也影响了微博谣言的正确识别。为了解决该问题,提出一种基于Co-Forest算法针对不平... TP181%TP301.6; 在微博谣言检测中,对微博谣言进行正确标注需要耗费大量的人力和时间,同时数据类别的不平衡也影响了微博谣言的正确识别。为了解决该问题,提出一种基于... |
SourceID | wanfang chongqing |
SourceType | Aggregation Database Publisher |
StartPage | 744 |
SubjectTerms | Co-Forest算法 SMOTE 不平衡数据 代价敏感 半监督学习 微博 谣言检测 |
Title | 一种用于微博谣言检测的半监督学习算法 |
URI | http://lib.cqvip.com/qk/93231X/201603/667958625.html https://d.wanfangdata.com.cn/periodical/jsjyyyj201603024 |
Volume | 33 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
journalDatabaseRights | – providerCode: PRVEBS databaseName: EBSCOhost Academic Search Ultimate issn: 1001-3695 databaseCode: ABDBF dateStart: 20130901 customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn isFulltext: true dateEnd: 99991231 titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn omitProxy: true ssIdentifier: ssib025702191 providerName: EBSCOhost |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NaxQxFA-1BfHit1irUqE5la2bmXweM-0sRdBThd6WmclMSw9bte2hPfXgSakgKEJFEBa0F4sggnrwH_Df6Hb7Z_jy0dlVpKiX4W2S995v82aTl2zyHkJTphAZZ0o2TFZWDcqrqpEXgjeMkRHJGc-5y95w9x6fv0_vLLLFkZEfQ6eWNtbzmWLrj_dK_seqUAZ2tbdk_8GytVAoABrsC0-wMDz_ysY4pTiR9rBCKrAWWM5ZQlGspavSWKY4ZThJsXYENFAapxInTaxjS2jPzu1HTyTQLHFygJ0GLqldSYIVccScY2dYA8GdLoW1h5FiJZycGPvElse-r1MHSMgvzdSsxWlBaot8gF9Ygb5EK5zUO4gOPpS1LJNqYjnruK2YacfesqotbMDQhCJlNekkdBYgtlqhujXtJJCgTbYA1PAmiL-dGUZseyYs5uELhSHdx9YIr248ND4LH2wyTPXCB_n8fRaJFVduFrEKZmoF9hwgdxFxIzqYPeszjZwLxWB9yE6hsUiA2zOKxnQyl7QG3ik4c8PRCiMbCGiwGrSh_PnQ8GvzC8J8Ug-_jMSCuWQF3tGgUOmDbQSAp9FUQH_7JOw2isjyamfpIfhG7qpap8o6S0Ne1cJ5dDYshya1f7cvoJGt5Yvo3HGqkckw81xC_ODLdv_9Tv_F3sG3Z73v-72d3aOP3aO97cPu9uHnp_3dx72dJ_3Xz_tvur0P7w6-vu3vvzr89PIyWmilC7PzjZDyo1Fw-xddqaghME5UmamkkYIaaiSpwL2q8gxc56yy-_YklqyMqGHckLw0BFYJVBkiqvgKGu2sdsqraJJkJI6ZiIq4lDQSmYxp2cwKWJ-X4CEXbBxN1L3QfuAju7RrG46jW6Ff2uH3vtZeWVvZ3NxciVxmdujHaydKmEBnbEu_W3cdja4_2ihvgP-6nt8M78VPTIF2pg |
linkProvider | EBSCOhost |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E4%B8%80%E7%A7%8D%E7%94%A8%E4%BA%8E%E5%BE%AE%E5%8D%9A%E8%B0%A3%E8%A8%80%E6%A3%80%E6%B5%8B%E7%9A%84%E5%8D%8A%E7%9B%91%E7%9D%A3%E5%AD%A6%E4%B9%A0%E7%AE%97%E6%B3%95&rft.jtitle=%E8%AE%A1%E7%AE%97%E6%9C%BA%E5%BA%94%E7%94%A8%E7%A0%94%E7%A9%B6&rft.au=%E8%B7%AF%E5%90%8C%E5%BC%BA+%E7%9F%B3%E5%86%B0+%E9%97%AB%E4%B8%AD%E6%95%8F+%E5%91%A8%E7%8F%AE&rft.date=2016&rft.issn=1001-3695&rft.volume=33&rft.issue=3&rft.spage=744&rft.epage=748&rft_id=info:doi/10.3969%2Fj.issn.1001-3695.2016.03.024&rft.externalDocID=667958625 |
thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fimage.cqvip.com%2Fvip1000%2Fqk%2F93231X%2F93231X.jpg http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fjsjyyyj%2Fjsjyyyj.jpg |