最大距离法选取初始簇中心的 K-means 文本聚类算法的研究

由于初始簇中心的随机选择,K—means算法在聚类时容易出现聚类结果局部最优、聚类结果不稳定、总迭代次数较多等问题。为了解决K.means算法所存在的以上问题,提出了最大距离法选取初始簇中心的K.iTleans文本聚类算法。该算法基于这样的事实:距离最远的样本点最不可能分到同一个簇中。为使该算法能应用于文本聚类,构造了一种将文本相似度转换为文本距离的方法,同时也重新构造了迭代中的簇中心计算公式和测度函数。在实例验证中,对分属于五个类别的1500篇文本组成的文本集进行了文本聚类分析,其结果表明,与原始的K—means聚类算法以及其他的两种改进的K—means聚类算法相比,新提出的文本聚类算法在降...

Full description

Saved in:
Bibliographic Details
Published in计算机应用研究 Vol. 31; no. 3; pp. 713 - 715
Main Author 翟东海 鱼江 高飞 于磊 丁锋
Format Journal Article
LanguageChinese
Published School of Information Science & Technology,Southwest Jiaotong University,Chengdu 610031,China 2014
Engineering School,Tibet University,Lhasa 850000,China%School of Information Science & Technology,Southwest Jiaotong University,Chengdu 610031,China%Engineering School,Tibet University,Lhasa 850000,China
Subjects
Online AccessGet full text
ISSN1001-3695
DOI10.3969/j.issn.1001-3695.2014.03.017

Cover

Abstract 由于初始簇中心的随机选择,K—means算法在聚类时容易出现聚类结果局部最优、聚类结果不稳定、总迭代次数较多等问题。为了解决K.means算法所存在的以上问题,提出了最大距离法选取初始簇中心的K.iTleans文本聚类算法。该算法基于这样的事实:距离最远的样本点最不可能分到同一个簇中。为使该算法能应用于文本聚类,构造了一种将文本相似度转换为文本距离的方法,同时也重新构造了迭代中的簇中心计算公式和测度函数。在实例验证中,对分属于五个类别的1500篇文本组成的文本集进行了文本聚类分析,其结果表明,与原始的K—means聚类算法以及其他的两种改进的K—means聚类算法相比,新提出的文本聚类算法在降低了聚类总耗时的同时,F度量值也有了明显提高。
AbstractList 由于初始簇中心的随机选择,K—means算法在聚类时容易出现聚类结果局部最优、聚类结果不稳定、总迭代次数较多等问题。为了解决K.means算法所存在的以上问题,提出了最大距离法选取初始簇中心的K.iTleans文本聚类算法。该算法基于这样的事实:距离最远的样本点最不可能分到同一个簇中。为使该算法能应用于文本聚类,构造了一种将文本相似度转换为文本距离的方法,同时也重新构造了迭代中的簇中心计算公式和测度函数。在实例验证中,对分属于五个类别的1500篇文本组成的文本集进行了文本聚类分析,其结果表明,与原始的K—means聚类算法以及其他的两种改进的K—means聚类算法相比,新提出的文本聚类算法在降低了聚类总耗时的同时,F度量值也有了明显提高。
TP301.6; 由于初始簇中心的随机选择,K-means算法在聚类时容易出现聚类结果局部最优、聚类结果不稳定、总迭代次数较多等问题。为了解决K-means算法所存在的以上问题,提出了最大距离法选取初始簇中心的K-means文本聚类算法。该算法基于这样的事实:距离最远的样本点最不可能分到同一个簇中。为使该算法能应用于文本聚类,构造了一种将文本相似度转换为文本距离的方法,同时也重新构造了迭代中的簇中心计算公式和测度函数。在实例验证中,对分属于五个类别的1 500篇文本组成的文本集进行了文本聚类分析,其结果表明,与原始的K-m
Abstract_FL Due to the random selection of initial cluster centers, K-means clustering algorithm is prone to local optimal and instability of clustering results, and huge number of iterations. To overcome the above problems, this paper selected the initial cluster ce
Author 翟东海 鱼江 高飞 于磊 丁锋
AuthorAffiliation 西南交通大学信息科学与技术学院,成都610031 西藏大学工学院,拉萨850000
AuthorAffiliation_xml – name: School of Information Science & Technology,Southwest Jiaotong University,Chengdu 610031,China;Engineering School,Tibet University,Lhasa 850000,China%School of Information Science & Technology,Southwest Jiaotong University,Chengdu 610031,China%Engineering School,Tibet University,Lhasa 850000,China
Author_FL ZHAI Dong-hai
YU Lei
GAO Fei
YU Jiang
DING Feng
Author_FL_xml – sequence: 1
  fullname: ZHAI Dong-hai
– sequence: 2
  fullname: YU Jiang
– sequence: 3
  fullname: GAO Fei
– sequence: 4
  fullname: YU Lei
– sequence: 5
  fullname: DING Feng
Author_xml – sequence: 1
  fullname: 翟东海 鱼江 高飞 于磊 丁锋
BookMark eNo9j09LAkEYxudgkFrfwaBDl93mz87OzjGkfyh08S6j7phLjuUQsTejQKNDlzIQhQ6RBBVFUIGHvky7Y9-iDaPTC8_7e973eTIgpVrKB2AZQZtwl68GdkNrZSMIkUVcTm0MkWNDYkPEUiD9r8-DjNYBhA5GHKZBIR52otvx9H1k7ibx69V35yy66Ee9UTQ-N8_dr4_H6PPEDE5zBavpC6Vzcb8bDx-mxwPzMjFP14kl2ZqbS3P_tgDmpNjT_uLfzILSxnopv2UVdza382tFq0o5s4T0Kgi7uIqhxFx4uMKgIxnFlPlUMOQK36tRWatASogjkUyC1hwHMS4wQZiSLFiZnT0SSgpVLwetw7ZKHpYDHYRhGPw2hyTpnaBLM7S621L1g0YC77cbTdEOy47HsMs8Rn4AlKF0KQ
ClassificationCodes TP301.6
ContentType Journal Article
Copyright Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
Copyright_xml – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
DBID 2RA
92L
CQIGP
W92
~WA
2B.
4A8
92I
93N
PSX
TCJ
DOI 10.3969/j.issn.1001-3695.2014.03.017
DatabaseName 中文期刊服务平台
中文科技期刊数据库-CALIS站点
维普中文期刊数据库
中文科技期刊数据库-工程技术
中文科技期刊数据库- 镜像站点
Wanfang Data Journals - Hong Kong
WANFANG Data Centre
Wanfang Data Journals
万方数据期刊 - 香港版
China Online Journals (COJ)
China Online Journals (COJ)
DatabaseTitleList

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
DocumentTitleAlternate K-means text clustering algorithm based on initial cluster centers selection according to maximum distance
DocumentTitle_FL K-means text clustering algorithm based on initial cluster centers selection according to maximum distance
EndPage 715
ExternalDocumentID jsjyyyj201403017
48726787
GrantInformation_xml – fundername: 国家语委“十二五”科研规划项目; 国家教育部科学技术研究重点项目; 中央高校基本科研业务费专项资金科技创新项目; 西藏自治区大学生创新性实验训练计划项目
  funderid: (YB125-49); (212167); (SWJTU12CX096); (2011CX051)
GroupedDBID -0Y
2B.
2C0
2RA
5XA
5XJ
92H
92I
92L
ACGFS
ALMA_UNASSIGNED_HOLDINGS
CCEZO
CQIGP
CUBFJ
CW9
TCJ
TGT
U1G
U5S
W92
~WA
4A8
93N
ABJNI
PSX
ID FETCH-LOGICAL-c597-af8b1262c20f29a82b704f75257e5a716ae8d5fdb05334f1f219d44179a231253
ISSN 1001-3695
IngestDate Thu May 29 03:54:49 EDT 2025
Wed Feb 14 10:37:48 EST 2024
IsPeerReviewed false
IsScholarly true
Issue 3
Keywords 测度函数
measurement function
text clustering
maximum distance
最大距离
text distance
文本距离
K-means clustering algorithm
文本聚类
F◣度量值
K-means聚类算法
F-measure
Language Chinese
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c597-af8b1262c20f29a82b704f75257e5a716ae8d5fdb05334f1f219d44179a231253
Notes 51-1196/TP
ZHAI Dong-hai, YU Jiangi, GAO Fei2, YU Leit, DING Feng2 ( 1. School of Information Science & Technology, Southwest Jiaotong University, Chengdu 610031, China; 2. Engineering School, Tibet Univer- sity, Lhasa 850000, China)
K-means clustering algorithm; maximum distance ; text clustering; text distance ; measurement function ; F-measure
Due to the random selection of initial cluster centers, K-means clustering algorithm is prone to local optimal and in- stability of clustering results, and huge number of iterations. To overcome the above problems, this paper selected the initial cluster centers according to maximum distance, and it was based on the fact that the farthest samples were the least likely in the same cluster. To apply the improved algorithm into text clustering, it constructed a method to transform text similarity into text distance, and also reconstructed cluster center iteration formula and measurement function. It employed a text set which included 5 categories and 1 500 texts in the exp
PageCount 3
ParticipantIDs wanfang_journals_jsjyyyj201403017
chongqing_primary_48726787
PublicationCentury 2000
PublicationDate 2014
PublicationDateYYYYMMDD 2014-01-01
PublicationDate_xml – year: 2014
  text: 2014
PublicationDecade 2010
PublicationTitle 计算机应用研究
PublicationTitleAlternate Application Research of Computers
PublicationTitle_FL Application Research of Computers
PublicationYear 2014
Publisher School of Information Science & Technology,Southwest Jiaotong University,Chengdu 610031,China
Engineering School,Tibet University,Lhasa 850000,China%School of Information Science & Technology,Southwest Jiaotong University,Chengdu 610031,China%Engineering School,Tibet University,Lhasa 850000,China
Publisher_xml – name: Engineering School,Tibet University,Lhasa 850000,China%School of Information Science & Technology,Southwest Jiaotong University,Chengdu 610031,China%Engineering School,Tibet University,Lhasa 850000,China
– name: School of Information Science & Technology,Southwest Jiaotong University,Chengdu 610031,China
SSID ssj0042190
ssib001102940
ssib002263599
ssib023646305
ssib051375744
ssib025702191
Score 1.9638652
Snippet 由于初始簇中心的随机选择,K—means算法在聚类时容易出现聚类结果局部最优、聚类结果不稳定、总迭代次数较多等问题。为了解决K.means算法所存在的以上问题,提出了最大距离...
TP301.6; 由于初始簇中心的随机选择,K-means算法在聚类时容易出现聚类结果局部最优、聚类结果不稳定、总迭代次数较多等问题。为了解决K-means算法所存在的以上问题,提出了...
SourceID wanfang
chongqing
SourceType Aggregation Database
Publisher
StartPage 713
SubjectTerms F度量值
K-means聚类算法
文本聚类
文本距离
最大距离
测度函数
Title 最大距离法选取初始簇中心的 K-means 文本聚类算法的研究
URI http://lib.cqvip.com/qk/93231X/201403/48726787.html
https://d.wanfangdata.com.cn/periodical/jsjyyyj201403017
Volume 31
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVEBS
  databaseName: EBSCOhost Academic Search Ultimate
  issn: 1001-3695
  databaseCode: ABDBF
  dateStart: 20130901
  customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn
  omitProxy: true
  ssIdentifier: ssib025702191
  providerName: EBSCOhost
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnR3LahRBsMkDxItvSeKDCOlTmLjz6NdxZjNLjOgpQm7L7OxMHuBGzeaQnCIKUTx40QghAQ9iEFQUQYUc_Bl3J_6FVT2d2TGoRC9N011d1dM1U1Xd01VFyJjbTFjCRGypiowtDyxSS8asAh-ew51E8ijR2Rpu3ORTt7zpWTbb1z9UurW00m5MxGu_9Sv5H65CG_AVvWT_gbMFUmiAOvAXSuAwlEfiMQ05VVW8rBAy6nvUFzSUNBBUTdJQUJ_TIECYwKWK0VAhpFQILGtUcV2RGpjhWBngqABgAI9HA0l93RXUqHSxS_lUetet2wla30iba1A9Cb-KtKWNQIjF1rRhEiFVojQJg0V3VajKK4oGvGwmIyoY6Nu_YAAqga8n5JuBUPryD6jMUYZdOsTMI45q55nCabOQbvgN9H400LBGA1jaqs4xiPEkxqcXoqU2pmbqXWYpoKrzSWuuuTIOZiLIr1KzyU5u5D7eLHN5nu_zQDEY9bRQPjfQUl7k7rPGYBC5zD-si1zFldZFSGCiIIC3CT0dVzf3WT0U7XtxeXF1dXURgXCzKvrJoIMnTQNk0A8mg1rP1AXLsBz60MGoQr2tJeYF4CVZjskKQTkVspzZrmA680FutXjQmUfuMPM8RsbMQ1z92yNgSJJ5WPq7YGhpv7dWGrXmSibazClywuytRv38QzlN-tbmz5CTB3lLRg2jz5Jr3e31zqvd_S872eu97qfnP9Yfd55udh7tdHafZB82vn991_n2INt6aN707uZGd_vt_v2t7ONe9v4FDIC-7OWz7M3nc2SmFs5UpyyTUsSKYedsRals2CCEYqeSOiqSTkNUvFRgROCERcIG0SSbLG02tId6aqewKk2dpC-CfZDD3PNkoLXUSobIqJfGiZC8qVjkeLDLiCJuN2yPRa4bgxrlw2SkWJj6nTxyTN1DEQgqcphcMStVN-JkuX6Y8yNHgLlAjmM9PxK8SAba91aSS2AktxuXzfvyE9n3otg
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E6%9C%80%E5%A4%A7%E8%B7%9D%E7%A6%BB%E6%B3%95%E9%80%89%E5%8F%96%E5%88%9D%E5%A7%8B%E7%B0%87%E4%B8%AD%E5%BF%83%E7%9A%84K-means%E6%96%87%E6%9C%AC%E8%81%9A%E7%B1%BB%E7%AE%97%E6%B3%95%E7%9A%84%E7%A0%94%E7%A9%B6&rft.jtitle=%E8%AE%A1%E7%AE%97%E6%9C%BA%E5%BA%94%E7%94%A8%E7%A0%94%E7%A9%B6&rft.date=2014&rft.pub=School+of+Information+Science+%26+Technology%EF%BC%8CSouthwest+Jiaotong+University%EF%BC%8CChengdu+610031%EF%BC%8CChina&rft.issn=1001-3695&rft.volume=31&rft.issue=3&rft.spage=713&rft.epage=719&rft_id=info:doi/10.3969%2Fj.issn.1001-3695.2014.03.017&rft.externalDocID=jsjyyyj201403017
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fimage.cqvip.com%2Fvip1000%2Fqk%2F93231X%2F93231X.jpg
http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fjsjyyyj%2Fjsjyyyj.jpg