Randomized self-updating process for clustering large-scale data

This paper introduces the randomized self-updating process (rSUP) algorithm for clustering large-scale data. rSUP is an extension of the self-updating process (SUP) algorithm, which has shown effectiveness in clustering data with characteristics such as noise, varying cluster shapes and sizes, and n...

Full description

Saved in:
Bibliographic Details
Published inStatistics and computing Vol. 34; no. 1
Main Authors Shiu, Shang-Ying, Chin, Yen-Shiu, Lin, Szu-Han, Chen, Ting-Li
Format Journal Article
LanguageEnglish
Published New York Springer US 01.02.2024
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN0960-3174
1573-1375
DOI10.1007/s11222-023-10355-8

Cover

More Information
Summary:This paper introduces the randomized self-updating process (rSUP) algorithm for clustering large-scale data. rSUP is an extension of the self-updating process (SUP) algorithm, which has shown effectiveness in clustering data with characteristics such as noise, varying cluster shapes and sizes, and numerous clusters. However, SUP’s reliance on pairwise dissimilarities between data points makes it computationally inefficient for large-scale data. To address this challenge, rSUP performs location updates within randomly generated data subsets at each iteration. The Law of Large Numbers guarantees that the clustering results of rSUP converge to those of the original SUP as the partition size grows. This paper demonstrates the effectiveness and computational efficiency of rSUP in large-scale data clustering through simulations and real datasets.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0960-3174
1573-1375
DOI:10.1007/s11222-023-10355-8