Massive Data Mining Algorithm for Web Text Based on Clustering Algorithm
The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and long time consuming in the process of mining mass data of Web text by using the current data mining algorithms. To solve these problems, a mass...
Saved in:
| Published in | Journal of advanced computational intelligence and intelligent informatics Vol. 23; no. 2; pp. 362 - 365 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | English |
| Published |
20.03.2019
|
| Online Access | Get full text |
| ISSN | 1343-0130 1883-8014 1883-8014 |
| DOI | 10.20965/jaciii.2019.p0362 |
Cover
| Abstract | The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and long time consuming in the process of mining mass data of Web text by using the current data mining algorithms. To solve these problems, a massive data mining algorithm of Web text based on clustering algorithm is proposed. By using chi square test, the feature words of massive data are extracted and the set of characteristic words is gotten. Hierarchical clustering of feature sets is made, TF-IDF values of each word in clustering set are calculated, and vector space model is constructed. By introducing fair operation and clone operation on bee colony algorithm, the diversity of vector space models can be improved. For the result of the clustering center, K-means is introduced to extract the local centroid and improve the quality of data mining. Experimental results show that the proposed algorithm can effectively improve data mining accuracy and time consuming. |
|---|---|
| AbstractList | The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and long time consuming in the process of mining mass data of Web text by using the current data mining algorithms. To solve these problems, a massive data mining algorithm of Web text based on clustering algorithm is proposed. By using chi square test, the feature words of massive data are extracted and the set of characteristic words is gotten. Hierarchical clustering of feature sets is made, TF-IDF values of each word in clustering set are calculated, and vector space model is constructed. By introducing fair operation and clone operation on bee colony algorithm, the diversity of vector space models can be improved. For the result of the clustering center, K-means is introduced to extract the local centroid and improve the quality of data mining. Experimental results show that the proposed algorithm can effectively improve data mining accuracy and time consuming. |
| Author | Luo, Nan-Chao |
| Author_xml | – sequence: 1 givenname: Nan-Chao surname: Luo fullname: Luo, Nan-Chao |
| BookMark | eNqNkMtOwzAQRS1UJErpD7DyD6TY8SPOspRHkVqxKWJpTYJdXKVOZbtA_57QsIEFYjV3pDkzo3OOBr71BqFLSiY5KaW42kDtnOsaWk52hMn8BA2pUixThPJBlxlnGaGMnKFxjBtCupxLwukQzZcQo3sz-AYS4KXzzq_xtFm3waXXLbZtwM-mwivzkfA1RPOCW49nzT4mE36MXqBTC0004-86Qk93t6vZPFs83j_Mpous5kqkTBoprCyBWW4pGFVRrupK2JqbnBGhuq8BZMVIAQWlgoEsSG1YbkEKWhYlGyHW7937HRzeoWn0LrgthIOmRB996N6H_vKhjz46SvVUHdoYg7G6dgmSa30K4Jq_0fwX-o97n2huea4 |
| CitedBy_id | crossref_primary_10_1007_s00779_019_01257_6 crossref_primary_10_3390_make6020047 crossref_primary_10_4236_ojps_2023_131003 crossref_primary_10_3390_su131910856 crossref_primary_10_20965_jaciii_2022_p0513 |
| Cites_doi | 10.20965/jaciii.2017.p1262 10.1007/s11042-015-2649-7 10.1007/s10618-015-0433-y 10.1364/OE.26.012948 10.1080/00207543.2016.1244615 |
| ContentType | Journal Article |
| CorporateAuthor | School of Mathematics and Computer Science, Aba Teachers University Wenchuan, Sichuan 623002, China |
| CorporateAuthor_xml | – name: School of Mathematics and Computer Science, Aba Teachers University Wenchuan, Sichuan 623002, China |
| DBID | AAYXX CITATION ADTOC UNPAY |
| DOI | 10.20965/jaciii.2019.p0362 |
| DatabaseName | CrossRef Unpaywall for CDI: Periodical Content Unpaywall |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1883-8014 |
| EndPage | 365 |
| ExternalDocumentID | 10.20965/jaciii.2019.p0362 10_20965_jaciii_2019_p0362 |
| GroupedDBID | AAYXX ALMA_UNASSIGNED_HOLDINGS ARCSS CITATION GROUPED_DOAJ ISHAI JSI JSP P2P RJT RZJ TUS ADTOC AFKRA ARAPS BENPR BGLVJ CCPQU HCIFZ K7- PHGZM PHGZT PQGLB UNPAY |
| ID | FETCH-LOGICAL-c485t-6e65f69a3f4f1ae8b148cb5fc4e23058883aa6b307a71153a670ce32fa6519793 |
| IEDL.DBID | UNPAY |
| ISSN | 1343-0130 1883-8014 |
| IngestDate | Tue Aug 19 23:19:20 EDT 2025 Thu Apr 24 23:05:16 EDT 2025 Wed Oct 01 05:08:49 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Language | English |
| License | cc-by-nd |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c485t-6e65f69a3f4f1ae8b148cb5fc4e23058883aa6b307a71153a670ce32fa6519793 |
| OpenAccessLink | https://proxy.k.utb.cz/login?url=https://doi.org/10.20965/jaciii.2019.p0362 |
| PageCount | 4 |
| ParticipantIDs | unpaywall_primary_10_20965_jaciii_2019_p0362 crossref_citationtrail_10_20965_jaciii_2019_p0362 crossref_primary_10_20965_jaciii_2019_p0362 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2019-03-20 |
| PublicationDateYYYYMMDD | 2019-03-20 |
| PublicationDate_xml | – month: 03 year: 2019 text: 2019-03-20 day: 20 |
| PublicationDecade | 2010 |
| PublicationTitle | Journal of advanced computational intelligence and intelligent informatics |
| PublicationYear | 2019 |
| References | key-10.20965/jaciii.2019.p0362-2 key-10.20965/jaciii.2019.p0362-1 key-10.20965/jaciii.2019.p0362-4 key-10.20965/jaciii.2019.p0362-3 key-10.20965/jaciii.2019.p0362-6 key-10.20965/jaciii.2019.p0362-5 key-10.20965/jaciii.2019.p0362-8 key-10.20965/jaciii.2019.p0362-7 |
| References_xml | – ident: key-10.20965/jaciii.2019.p0362-5 doi: 10.20965/jaciii.2017.p1262 – ident: key-10.20965/jaciii.2019.p0362-3 doi: 10.1007/s11042-015-2649-7 – ident: key-10.20965/jaciii.2019.p0362-6 – ident: key-10.20965/jaciii.2019.p0362-7 – ident: key-10.20965/jaciii.2019.p0362-4 – ident: key-10.20965/jaciii.2019.p0362-2 doi: 10.1007/s10618-015-0433-y – ident: key-10.20965/jaciii.2019.p0362-8 doi: 10.1364/OE.26.012948 – ident: key-10.20965/jaciii.2019.p0362-1 doi: 10.1080/00207543.2016.1244615 |
| SSID | ssj0001326041 ssib051641541 |
| Score | 2.1283548 |
| Snippet | The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and... |
| SourceID | unpaywall crossref |
| SourceType | Open Access Repository Enrichment Source Index Database |
| StartPage | 362 |
| Title | Massive Data Mining Algorithm for Web Text Based on Clustering Algorithm |
| URI | https://doi.org/10.20965/jaciii.2019.p0362 |
| UnpaywallVersion | publishedVersion |
| Volume | 23 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: Directory of Open Access Journals customDbUrl: eissn: 1883-8014 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001326041 issn: 1883-8014 databaseCode: DOA dateStart: 20070101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1883-8014 dateEnd: 99991231 omitProxy: true ssIdentifier: ssib051641541 issn: 1343-0130 databaseCode: M~E dateStart: 19970101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEN4YOOhFfEZ8kD140yJtd5f2iCghJhAPEPHUzC5bX7UQbGP017vTFgImGj02mTbNdHfm-7oz3xBy6uGZeqgcS0objxmVsMAGsBqca8YkA63wh36vL7pDdjPio0ImB3thls7vHRQmuXgGhSoLJk_59WkebsuCG9xdIuVh_7Z1nzEqhkVB2WAR2_NcDLss75D54SErWWg9jafw8Q5RtJRaOpV8RtFbpkiIFSUv9TSRdfX5Ta_xb2-9RTYLhElb-ZLYJms63iGV-fQGWmzmXdLtGdxsYh29ggRoLxsUQVvRw2T2lDy-UgNm6Z2WdGCiN700uW5MJzFtRykqK6yY7pFh53rQ7lrFXAVLMY8nltCCh8IHN2ShDdqThhIpyUPFtCEk3HBiF0BIs_uhaQCjC6LZUNp1QhDY5uq7-6QUT2J9QCj43NfKV9Ieo5iYIx2DaIQYh7IZNqSWVWLP_RyoQnQcZ19EgSEfmauC3FUBuirIXFUlZ4t7prnkxq_W54vP9wfzw_-ZH5ENvMCiM6dxTErJLNUnBoUkspax91qxCL8Ae__XWA |
| linkProvider | Unpaywall |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEN4YOOhFfEZ8ZQ_etEjb3aU9IkqICcQDRDw1s8vWVy0E2xj99e60hYCJBo9Npk0z3Z35pjvzfYSceXimHirHktLGY0YlLLABrDrnmjHJQCv8od_tic6A3Q75sKDJwVmYhfN7B4lJLl9AIcuCyVN-bZKH27LgBneXSHnQu2s-ZBUVw6agTFjE9jwXwy7LJ2R-echSFlpP4wl8fkAULaSWdiXXKHrPGAmxo-S1liaypr5-8DWu9tZbZLNAmLSZL4ltsqbjHVKZqTfQYjPvkk7X4GYT6-g1JEC7mVAEbUaP4-lz8vRGDZil91rSvone9MrkuhEdx7QVpcissGS6Rwbtm36rYxW6CpZiHk8soQUPhQ9uyEIbtCdNSaQkDxXTpiDhpiZ2AYQ0ux8aBjC6IBp1pV0nBIFjrr67T0rxONYHhILPfa18Je0Rkok50jGIRohRKBthXWpZJfbMz4EqSMdR-yIKTPGRuSrIXRWgq4LMVVVyPr9nklNu_Gl9Mf98K5gf_s_8iGzgBTadOfVjUkqmqT4xKCSRp8Xy-wZ1b9Zj |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Massive+Data+Mining+Algorithm+for+Web+Text+Based+on+Clustering+Algorithm&rft.jtitle=Journal+of+advanced+computational+intelligence+and+intelligent+informatics&rft.au=Luo%2C+Nan-Chao&rft.date=2019-03-20&rft.issn=1343-0130&rft.eissn=1883-8014&rft.volume=23&rft.issue=2&rft.spage=362&rft.epage=365&rft_id=info:doi/10.20965%2Fjaciii.2019.p0362&rft.externalDBID=n%2Fa&rft.externalDocID=10_20965_jaciii_2019_p0362 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1343-0130&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1343-0130&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1343-0130&client=summon |