Parallel Clustering Optimization Algorithm Based on MapReduce in Big Data Mining

Traditional data mining algorithms have such defects as low computational efficiency and high memory usage, increasingly unsuitable for the current situation of big data processing. This article investigates Hadoop platform characteristics on the basis of the MapReduce framework mode, adopting the T...

Full description

Saved in:
Bibliographic Details
Published inIAENG international journal of applied mathematics Vol. 53; no. 1; pp. 1 - 7
Main Authors Zhang, Huajie, Song, Lei, Zhang, Sen
Format Journal Article
LanguageEnglish
Published Hong Kong International Association of Engineers 01.03.2023
Subjects
Online AccessGet full text
ISSN1992-9978
1992-9986

Cover

More Information
Summary:Traditional data mining algorithms have such defects as low computational efficiency and high memory usage, increasingly unsuitable for the current situation of big data processing. This article investigates Hadoop platform characteristics on the basis of the MapReduce framework mode, adopting the Top-K algorithm for parallel random sampling. To overcome the deficiency of conventional K-Medoids method in data processing and to optimize traditional algorithms, internal replacement strategy and horizontal performance expansion are adopted. Through the experimental test of the improved K-Medoids algorithm, a conclusion was obtained that the optimized parallel clustering K-Medoids algorithm based on the MapReduce framework has been significantly improved in terms of clustering accuracy, running time, speedup ratio and convergence, which meets the requirement of big data mining, analysis and processing.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1992-9978
1992-9986