Randomized self-updating process for clustering large-scale data
This paper introduces the randomized self-updating process (rSUP) algorithm for clustering large-scale data. rSUP is an extension of the self-updating process (SUP) algorithm, which has shown effectiveness in clustering data with characteristics such as noise, varying cluster shapes and sizes, and n...
Saved in:
| Published in | Statistics and computing Vol. 34; no. 1 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
Springer US
01.02.2024
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0960-3174 1573-1375 |
| DOI | 10.1007/s11222-023-10355-8 |
Cover
| Summary: | This paper introduces the randomized self-updating process (rSUP) algorithm for clustering large-scale data. rSUP is an extension of the self-updating process (SUP) algorithm, which has shown effectiveness in clustering data with characteristics such as noise, varying cluster shapes and sizes, and numerous clusters. However, SUP’s reliance on pairwise dissimilarities between data points makes it computationally inefficient for large-scale data. To address this challenge, rSUP performs location updates within randomly generated data subsets at each iteration. The Law of Large Numbers guarantees that the clustering results of rSUP converge to those of the original SUP as the partition size grows. This paper demonstrates the effectiveness and computational efficiency of rSUP in large-scale data clustering through simulations and real datasets. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0960-3174 1573-1375 |
| DOI: | 10.1007/s11222-023-10355-8 |