Massive Data Mining Algorithm for Web Text Based on Clustering Algorithm

The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and long time consuming in the process of mining mass data of Web text by using the current data mining algorithms. To solve these problems, a mass...

Full description

Saved in:

Bibliographic Details
Published in	Journal of advanced computational intelligence and intelligent informatics Vol. 23; no. 2; pp. 362 - 365
Main Author	Luo, Nan-Chao
Format	Journal Article
Language	English
Published	20.03.2019
Online Access	Get full text
ISSN	1343-0130 1883-8014 1883-8014
DOI	10.20965/jaciii.2019.p0362

Cover

More Information
Summary:	The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and long time consuming in the process of mining mass data of Web text by using the current data mining algorithms. To solve these problems, a massive data mining algorithm of Web text based on clustering algorithm is proposed. By using chi square test, the feature words of massive data are extracted and the set of characteristic words is gotten. Hierarchical clustering of feature sets is made, TF-IDF values of each word in clustering set are calculated, and vector space model is constructed. By introducing fair operation and clone operation on bee colony algorithm, the diversity of vector space models can be improved. For the result of the clustering center, K-means is introduced to extract the local centroid and improve the quality of data mining. Experimental results show that the proposed algorithm can effectively improve data mining accuracy and time consuming.
ISSN:	1343-0130 1883-8014 1883-8014
DOI:	10.20965/jaciii.2019.p0362