Massive Data Mining Algorithm for Web Text Based on Clustering Algorithm

The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and long time consuming in the process of mining mass data of Web text by using the current data mining algorithms. To solve these problems, a mass...

Full description

Saved in:
Bibliographic Details
Published inJournal of advanced computational intelligence and intelligent informatics Vol. 23; no. 2; pp. 362 - 365
Main Author Luo, Nan-Chao
Format Journal Article
LanguageEnglish
Published 20.03.2019
Online AccessGet full text
ISSN1343-0130
1883-8014
1883-8014
DOI10.20965/jaciii.2019.p0362

Cover

Abstract The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and long time consuming in the process of mining mass data of Web text by using the current data mining algorithms. To solve these problems, a massive data mining algorithm of Web text based on clustering algorithm is proposed. By using chi square test, the feature words of massive data are extracted and the set of characteristic words is gotten. Hierarchical clustering of feature sets is made, TF-IDF values of each word in clustering set are calculated, and vector space model is constructed. By introducing fair operation and clone operation on bee colony algorithm, the diversity of vector space models can be improved. For the result of the clustering center, K-means is introduced to extract the local centroid and improve the quality of data mining. Experimental results show that the proposed algorithm can effectively improve data mining accuracy and time consuming.
AbstractList The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and long time consuming in the process of mining mass data of Web text by using the current data mining algorithms. To solve these problems, a massive data mining algorithm of Web text based on clustering algorithm is proposed. By using chi square test, the feature words of massive data are extracted and the set of characteristic words is gotten. Hierarchical clustering of feature sets is made, TF-IDF values of each word in clustering set are calculated, and vector space model is constructed. By introducing fair operation and clone operation on bee colony algorithm, the diversity of vector space models can be improved. For the result of the clustering center, K-means is introduced to extract the local centroid and improve the quality of data mining. Experimental results show that the proposed algorithm can effectively improve data mining accuracy and time consuming.
Author Luo, Nan-Chao
Author_xml – sequence: 1
  givenname: Nan-Chao
  surname: Luo
  fullname: Luo, Nan-Chao
BookMark eNqNkMtOwzAQRS1UJErpD7DyD6TY8SPOspRHkVqxKWJpTYJdXKVOZbtA_57QsIEFYjV3pDkzo3OOBr71BqFLSiY5KaW42kDtnOsaWk52hMn8BA2pUixThPJBlxlnGaGMnKFxjBtCupxLwukQzZcQo3sz-AYS4KXzzq_xtFm3waXXLbZtwM-mwivzkfA1RPOCW49nzT4mE36MXqBTC0004-86Qk93t6vZPFs83j_Mpous5kqkTBoprCyBWW4pGFVRrupK2JqbnBGhuq8BZMVIAQWlgoEsSG1YbkEKWhYlGyHW7937HRzeoWn0LrgthIOmRB996N6H_vKhjz46SvVUHdoYg7G6dgmSa30K4Jq_0fwX-o97n2huea4
CitedBy_id crossref_primary_10_1007_s00779_019_01257_6
crossref_primary_10_3390_make6020047
crossref_primary_10_4236_ojps_2023_131003
crossref_primary_10_3390_su131910856
crossref_primary_10_20965_jaciii_2022_p0513
Cites_doi 10.20965/jaciii.2017.p1262
10.1007/s11042-015-2649-7
10.1007/s10618-015-0433-y
10.1364/OE.26.012948
10.1080/00207543.2016.1244615
ContentType Journal Article
CorporateAuthor School of Mathematics and Computer Science, Aba Teachers University Wenchuan, Sichuan 623002, China
CorporateAuthor_xml – name: School of Mathematics and Computer Science, Aba Teachers University Wenchuan, Sichuan 623002, China
DBID AAYXX
CITATION
ADTOC
UNPAY
DOI 10.20965/jaciii.2019.p0362
DatabaseName CrossRef
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
Database_xml – sequence: 1
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1883-8014
EndPage 365
ExternalDocumentID 10.20965/jaciii.2019.p0362
10_20965_jaciii_2019_p0362
GroupedDBID AAYXX
ALMA_UNASSIGNED_HOLDINGS
ARCSS
CITATION
GROUPED_DOAJ
ISHAI
JSI
JSP
P2P
RJT
RZJ
TUS
ADTOC
AFKRA
ARAPS
BENPR
BGLVJ
CCPQU
HCIFZ
K7-
PHGZM
PHGZT
PQGLB
UNPAY
ID FETCH-LOGICAL-c485t-6e65f69a3f4f1ae8b148cb5fc4e23058883aa6b307a71153a670ce32fa6519793
IEDL.DBID UNPAY
ISSN 1343-0130
1883-8014
IngestDate Tue Aug 19 23:19:20 EDT 2025
Thu Apr 24 23:05:16 EDT 2025
Wed Oct 01 05:08:49 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 2
Language English
License cc-by-nd
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c485t-6e65f69a3f4f1ae8b148cb5fc4e23058883aa6b307a71153a670ce32fa6519793
OpenAccessLink https://proxy.k.utb.cz/login?url=https://doi.org/10.20965/jaciii.2019.p0362
PageCount 4
ParticipantIDs unpaywall_primary_10_20965_jaciii_2019_p0362
crossref_citationtrail_10_20965_jaciii_2019_p0362
crossref_primary_10_20965_jaciii_2019_p0362
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2019-03-20
PublicationDateYYYYMMDD 2019-03-20
PublicationDate_xml – month: 03
  year: 2019
  text: 2019-03-20
  day: 20
PublicationDecade 2010
PublicationTitle Journal of advanced computational intelligence and intelligent informatics
PublicationYear 2019
References key-10.20965/jaciii.2019.p0362-2
key-10.20965/jaciii.2019.p0362-1
key-10.20965/jaciii.2019.p0362-4
key-10.20965/jaciii.2019.p0362-3
key-10.20965/jaciii.2019.p0362-6
key-10.20965/jaciii.2019.p0362-5
key-10.20965/jaciii.2019.p0362-8
key-10.20965/jaciii.2019.p0362-7
References_xml – ident: key-10.20965/jaciii.2019.p0362-5
  doi: 10.20965/jaciii.2017.p1262
– ident: key-10.20965/jaciii.2019.p0362-3
  doi: 10.1007/s11042-015-2649-7
– ident: key-10.20965/jaciii.2019.p0362-6
– ident: key-10.20965/jaciii.2019.p0362-7
– ident: key-10.20965/jaciii.2019.p0362-4
– ident: key-10.20965/jaciii.2019.p0362-2
  doi: 10.1007/s10618-015-0433-y
– ident: key-10.20965/jaciii.2019.p0362-8
  doi: 10.1364/OE.26.012948
– ident: key-10.20965/jaciii.2019.p0362-1
  doi: 10.1080/00207543.2016.1244615
SSID ssj0001326041
ssib051641541
Score 2.1283548
Snippet The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and...
SourceID unpaywall
crossref
SourceType Open Access Repository
Enrichment Source
Index Database
StartPage 362
Title Massive Data Mining Algorithm for Web Text Based on Clustering Algorithm
URI https://doi.org/10.20965/jaciii.2019.p0362
UnpaywallVersion publishedVersion
Volume 23
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: Directory of Open Access Journals
  customDbUrl:
  eissn: 1883-8014
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0001326041
  issn: 1883-8014
  databaseCode: DOA
  dateStart: 20070101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1883-8014
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssib051641541
  issn: 1343-0130
  databaseCode: M~E
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEN4YOOhFfEZ8kD140yJtd5f2iCghJhAPEPHUzC5bX7UQbGP017vTFgImGj02mTbNdHfm-7oz3xBy6uGZeqgcS0objxmVsMAGsBqca8YkA63wh36vL7pDdjPio0ImB3thls7vHRQmuXgGhSoLJk_59WkebsuCG9xdIuVh_7Z1nzEqhkVB2WAR2_NcDLss75D54SErWWg9jafw8Q5RtJRaOpV8RtFbpkiIFSUv9TSRdfX5Ta_xb2-9RTYLhElb-ZLYJms63iGV-fQGWmzmXdLtGdxsYh29ggRoLxsUQVvRw2T2lDy-UgNm6Z2WdGCiN700uW5MJzFtRykqK6yY7pFh53rQ7lrFXAVLMY8nltCCh8IHN2ShDdqThhIpyUPFtCEk3HBiF0BIs_uhaQCjC6LZUNp1QhDY5uq7-6QUT2J9QCj43NfKV9Ieo5iYIx2DaIQYh7IZNqSWVWLP_RyoQnQcZ19EgSEfmauC3FUBuirIXFUlZ4t7prnkxq_W54vP9wfzw_-ZH5ENvMCiM6dxTErJLNUnBoUkspax91qxCL8Ae__XWA
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEN4YOOhFfEZ8ZQ_etEjb3aU9IkqICcQDRDw1s8vWVy0E2xj99e60hYCJBo9Npk0z3Z35pjvzfYSceXimHirHktLGY0YlLLABrDrnmjHJQCv8od_tic6A3Q75sKDJwVmYhfN7B4lJLl9AIcuCyVN-bZKH27LgBneXSHnQu2s-ZBUVw6agTFjE9jwXwy7LJ2R-echSFlpP4wl8fkAULaSWdiXXKHrPGAmxo-S1liaypr5-8DWu9tZbZLNAmLSZL4ltsqbjHVKZqTfQYjPvkk7X4GYT6-g1JEC7mVAEbUaP4-lz8vRGDZil91rSvone9MrkuhEdx7QVpcissGS6Rwbtm36rYxW6CpZiHk8soQUPhQ9uyEIbtCdNSaQkDxXTpiDhpiZ2AYQ0ux8aBjC6IBp1pV0nBIFjrr67T0rxONYHhILPfa18Je0Rkok50jGIRohRKBthXWpZJfbMz4EqSMdR-yIKTPGRuSrIXRWgq4LMVVVyPr9nklNu_Gl9Mf98K5gf_s_8iGzgBTadOfVjUkqmqT4xKCSRp8Xy-wZ1b9Zj
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Massive+Data+Mining+Algorithm+for+Web+Text+Based+on+Clustering+Algorithm&rft.jtitle=Journal+of+advanced+computational+intelligence+and+intelligent+informatics&rft.au=Luo%2C+Nan-Chao&rft.date=2019-03-20&rft.issn=1343-0130&rft.eissn=1883-8014&rft.volume=23&rft.issue=2&rft.spage=362&rft.epage=365&rft_id=info:doi/10.20965%2Fjaciii.2019.p0362&rft.externalDBID=n%2Fa&rft.externalDocID=10_20965_jaciii_2019_p0362
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1343-0130&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1343-0130&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1343-0130&client=summon