Massive Data Mining Algorithm for Web Text Based on Clustering Algorithm

The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and long time consuming in the process of mining mass data of Web text by using the current data mining algorithms. To solve these problems, a mass...

Full description

Saved in:

Bibliographic Details
Published in	Journal of advanced computational intelligence and intelligent informatics Vol. 23; no. 2; pp. 362 - 365
Main Author	Luo, Nan-Chao
Format	Journal Article
Language	English
Published	20.03.2019
Online Access	Get full text
ISSN	1343-0130 1883-8014 1883-8014
DOI	10.20965/jaciii.2019.p0362

Cover

Abstract	The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and long time consuming in the process of mining mass data of Web text by using the current data mining algorithms. To solve these problems, a massive data mining algorithm of Web text based on clustering algorithm is proposed. By using chi square test, the feature words of massive data are extracted and the set of characteristic words is gotten. Hierarchical clustering of feature sets is made, TF-IDF values of each word in clustering set are calculated, and vector space model is constructed. By introducing fair operation and clone operation on bee colony algorithm, the diversity of vector space models can be improved. For the result of the clustering center, K-means is introduced to extract the local centroid and improve the quality of data mining. Experimental results show that the proposed algorithm can effectively improve data mining accuracy and time consuming.
AbstractList	The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and long time consuming in the process of mining mass data of Web text by using the current data mining algorithms. To solve these problems, a massive data mining algorithm of Web text based on clustering algorithm is proposed. By using chi square test, the feature words of massive data are extracted and the set of characteristic words is gotten. Hierarchical clustering of feature sets is made, TF-IDF values of each word in clustering set are calculated, and vector space model is constructed. By introducing fair operation and clone operation on bee colony algorithm, the diversity of vector space models can be improved. For the result of the clustering center, K-means is introduced to extract the local centroid and improve the quality of data mining. Experimental results show that the proposed algorithm can effectively improve data mining accuracy and time consuming.
Author	Luo, Nan-Chao
Author_xml	– sequence: 1 givenname: Nan-Chao surname: Luo fullname: Luo, Nan-Chao
BookMark	eNqNkMtOwzAQRS1UJErpD7DyD6TY8SPOspRHkVqxKWJpTYJdXKVOZbtA_57QsIEFYjV3pDkzo3OOBr71BqFLSiY5KaW42kDtnOsaWk52hMn8BA2pUixThPJBlxlnGaGMnKFxjBtCupxLwukQzZcQo3sz-AYS4KXzzq_xtFm3waXXLbZtwM-mwivzkfA1RPOCW49nzT4mE36MXqBTC0004-86Qk93t6vZPFs83j_Mpous5kqkTBoprCyBWW4pGFVRrupK2JqbnBGhuq8BZMVIAQWlgoEsSG1YbkEKWhYlGyHW7937HRzeoWn0LrgthIOmRB996N6H_vKhjz46SvVUHdoYg7G6dgmSa30K4Jq_0fwX-o97n2huea4
CitedBy_id	crossref_primary_10_1007_s00779_019_01257_6 crossref_primary_10_3390_make6020047 crossref_primary_10_4236_ojps_2023_131003 crossref_primary_10_3390_su131910856 crossref_primary_10_20965_jaciii_2022_p0513
Cites_doi	10.20965/jaciii.2017.p1262 10.1007/s11042-015-2649-7 10.1007/s10618-015-0433-y 10.1364/OE.26.012948 10.1080/00207543.2016.1244615
ContentType	Journal Article
CorporateAuthor	School of Mathematics and Computer Science, Aba Teachers University Wenchuan, Sichuan 623002, China
CorporateAuthor_xml	– name: School of Mathematics and Computer Science, Aba Teachers University Wenchuan, Sichuan 623002, China
DBID	AAYXX CITATION ADTOC UNPAY
DOI	10.20965/jaciii.2019.p0362
DatabaseName	CrossRef Unpaywall for CDI: Periodical Content Unpaywall
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
Database_xml	– sequence: 1 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1883-8014
EndPage	365
ExternalDocumentID	10.20965/jaciii.2019.p0362 10_20965_jaciii_2019_p0362
GroupedDBID	AAYXX ALMA_UNASSIGNED_HOLDINGS ARCSS CITATION GROUPED_DOAJ ISHAI JSI JSP P2P RJT RZJ TUS ADTOC AFKRA ARAPS BENPR BGLVJ CCPQU HCIFZ K7- PHGZM PHGZT PQGLB UNPAY
ID	FETCH-LOGICAL-c485t-6e65f69a3f4f1ae8b148cb5fc4e23058883aa6b307a71153a670ce32fa6519793
IEDL.DBID	UNPAY
ISSN	1343-0130 1883-8014
IngestDate	Tue Aug 19 23:19:20 EDT 2025 Thu Apr 24 23:05:16 EDT 2025 Wed Oct 01 05:08:49 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	2
Language	English
License	cc-by-nd
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c485t-6e65f69a3f4f1ae8b148cb5fc4e23058883aa6b307a71153a670ce32fa6519793
OpenAccessLink	https://proxy.k.utb.cz/login?url=https://doi.org/10.20965/jaciii.2019.p0362
PageCount	4
ParticipantIDs	unpaywall_primary_10_20965_jaciii_2019_p0362 crossref_citationtrail_10_20965_jaciii_2019_p0362 crossref_primary_10_20965_jaciii_2019_p0362
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2019-03-20
PublicationDateYYYYMMDD	2019-03-20
PublicationDate_xml	– month: 03 year: 2019 text: 2019-03-20 day: 20
PublicationDecade	2010
PublicationTitle	Journal of advanced computational intelligence and intelligent informatics
PublicationYear	2019
References	key-10.20965/jaciii.2019.p0362-2 key-10.20965/jaciii.2019.p0362-1 key-10.20965/jaciii.2019.p0362-4 key-10.20965/jaciii.2019.p0362-3 key-10.20965/jaciii.2019.p0362-6 key-10.20965/jaciii.2019.p0362-5 key-10.20965/jaciii.2019.p0362-8 key-10.20965/jaciii.2019.p0362-7
References_xml	– ident: key-10.20965/jaciii.2019.p0362-5 doi: 10.20965/jaciii.2017.p1262 – ident: key-10.20965/jaciii.2019.p0362-3 doi: 10.1007/s11042-015-2649-7 – ident: key-10.20965/jaciii.2019.p0362-6 – ident: key-10.20965/jaciii.2019.p0362-7 – ident: key-10.20965/jaciii.2019.p0362-4 – ident: key-10.20965/jaciii.2019.p0362-2 doi: 10.1007/s10618-015-0433-y – ident: key-10.20965/jaciii.2019.p0362-8 doi: 10.1364/OE.26.012948 – ident: key-10.20965/jaciii.2019.p0362-1 doi: 10.1080/00207543.2016.1244615
SSID	ssj0001326041 ssib051641541
Score	2.1283548
Snippet	The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and...
SourceID	unpaywall crossref
SourceType	Open Access Repository Enrichment Source Index Database
StartPage	362
Title	Massive Data Mining Algorithm for Web Text Based on Clustering Algorithm
URI	https://doi.org/10.20965/jaciii.2019.p0362
UnpaywallVersion	publishedVersion
Volume	23
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVAON databaseName: Directory of Open Access Journals customDbUrl: eissn: 1883-8014 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001326041 issn: 1883-8014 databaseCode: DOA dateStart: 20070101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1883-8014 dateEnd: 99991231 omitProxy: true ssIdentifier: ssib051641541 issn: 1343-0130 databaseCode: M~E dateStart: 19970101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEN4YOOhFfEZ8kD140yJtd5f2iCghJhAPEPHUzC5bX7UQbGP017vTFgImGj02mTbNdHfm-7oz3xBy6uGZeqgcS0objxmVsMAGsBqca8YkA63wh36vL7pDdjPio0ImB3thls7vHRQmuXgGhSoLJk_59WkebsuCG9xdIuVh_7Z1nzEqhkVB2WAR2_NcDLss75D54SErWWg9jafw8Q5RtJRaOpV8RtFbpkiIFSUv9TSRdfX5Ta_xb2-9RTYLhElb-ZLYJms63iGV-fQGWmzmXdLtGdxsYh29ggRoLxsUQVvRw2T2lDy-UgNm6Z2WdGCiN700uW5MJzFtRykqK6yY7pFh53rQ7lrFXAVLMY8nltCCh8IHN2ShDdqThhIpyUPFtCEk3HBiF0BIs_uhaQCjC6LZUNp1QhDY5uq7-6QUT2J9QCj43NfKV9Ieo5iYIx2DaIQYh7IZNqSWVWLP_RyoQnQcZ19EgSEfmauC3FUBuirIXFUlZ4t7prnkxq_W54vP9wfzw_-ZH5ENvMCiM6dxTErJLNUnBoUkspax91qxCL8Ae__XWA
linkProvider	Unpaywall
linkToUnpaywall	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEN4YOOhFfEZ8ZQ_etEjb3aU9IkqICcQDRDw1s8vWVy0E2xj99e60hYCJBo9Npk0z3Z35pjvzfYSceXimHirHktLGY0YlLLABrDrnmjHJQCv8od_tic6A3Q75sKDJwVmYhfN7B4lJLl9AIcuCyVN-bZKH27LgBneXSHnQu2s-ZBUVw6agTFjE9jwXwy7LJ2R-echSFlpP4wl8fkAULaSWdiXXKHrPGAmxo-S1liaypr5-8DWu9tZbZLNAmLSZL4ltsqbjHVKZqTfQYjPvkk7X4GYT6-g1JEC7mVAEbUaP4-lz8vRGDZil91rSvone9MrkuhEdx7QVpcissGS6Rwbtm36rYxW6CpZiHk8soQUPhQ9uyEIbtCdNSaQkDxXTpiDhpiZ2AYQ0ux8aBjC6IBp1pV0nBIFjrr67T0rxONYHhILPfa18Je0Rkok50jGIRohRKBthXWpZJfbMz4EqSMdR-yIKTPGRuSrIXRWgq4LMVVVyPr9nklNu_Gl9Mf98K5gf_s_8iGzgBTadOfVjUkqmqT4xKCSRp8Xy-wZ1b9Zj
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Massive+Data+Mining+Algorithm+for+Web+Text+Based+on+Clustering+Algorithm&rft.jtitle=Journal+of+advanced+computational+intelligence+and+intelligent+informatics&rft.au=Luo%2C+Nan-Chao&rft.date=2019-03-20&rft.issn=1343-0130&rft.eissn=1883-8014&rft.volume=23&rft.issue=2&rft.spage=362&rft.epage=365&rft_id=info:doi/10.20965%2Fjaciii.2019.p0362&rft.externalDBID=n%2Fa&rft.externalDocID=10_20965_jaciii_2019_p0362
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1343-0130&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1343-0130&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1343-0130&client=summon