Parallel Clustering Optimization Algorithm Based on MapReduce in Big Data Mining

Traditional data mining algorithms have such defects as low computational efficiency and high memory usage, increasingly unsuitable for the current situation of big data processing. This article investigates Hadoop platform characteristics on the basis of the MapReduce framework mode, adopting the T...

Full description

Saved in:

Bibliographic Details
Published in	IAENG international journal of applied mathematics Vol. 53; no. 1; pp. 1 - 7
Main Authors	Zhang, Huajie, Song, Lei, Zhang, Sen
Format	Journal Article
Language	English
Published	Hong Kong International Association of Engineers 01.03.2023
Subjects	Accuracy Algorithms Big Data Cluster analysis Clustering Data mining Data processing Design optimization Efficiency Optimization Optimization algorithms Random sampling
Online Access	Get full text
ISSN	1992-9978 1992-9986

Cover

More Information
Summary:	Traditional data mining algorithms have such defects as low computational efficiency and high memory usage, increasingly unsuitable for the current situation of big data processing. This article investigates Hadoop platform characteristics on the basis of the MapReduce framework mode, adopting the Top-K algorithm for parallel random sampling. To overcome the deficiency of conventional K-Medoids method in data processing and to optimize traditional algorithms, internal replacement strategy and horizontal performance expansion are adopted. Through the experimental test of the improved K-Medoids algorithm, a conclusion was obtained that the optimized parallel clustering K-Medoids algorithm based on the MapReduce framework has been significantly improved in terms of clustering accuracy, running time, speedup ratio and convergence, which meets the requirement of big data mining, analysis and processing.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1992-9978 1992-9986