Parallel Clustering Optimization Algorithm Based on MapReduce in Big Data Mining
Traditional data mining algorithms have such defects as low computational efficiency and high memory usage, increasingly unsuitable for the current situation of big data processing. This article investigates Hadoop platform characteristics on the basis of the MapReduce framework mode, adopting the T...
Saved in:
| Published in | IAENG international journal of applied mathematics Vol. 53; no. 1; pp. 1 - 7 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
Hong Kong
International Association of Engineers
01.03.2023
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1992-9978 1992-9986 |
Cover
| Summary: | Traditional data mining algorithms have such defects as low computational efficiency and high memory usage, increasingly unsuitable for the current situation of big data processing. This article investigates Hadoop platform characteristics on the basis of the MapReduce framework mode, adopting the Top-K algorithm for parallel random sampling. To overcome the deficiency of conventional K-Medoids method in data processing and to optimize traditional algorithms, internal replacement strategy and horizontal performance expansion are adopted. Through the experimental test of the improved K-Medoids algorithm, a conclusion was obtained that the optimized parallel clustering K-Medoids algorithm based on the MapReduce framework has been significantly improved in terms of clustering accuracy, running time, speedup ratio and convergence, which meets the requirement of big data mining, analysis and processing. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1992-9978 1992-9986 |