Randomized self-updating process for clustering large-scale data

This paper introduces the randomized self-updating process (rSUP) algorithm for clustering large-scale data. rSUP is an extension of the self-updating process (SUP) algorithm, which has shown effectiveness in clustering data with characteristics such as noise, varying cluster shapes and sizes, and n...

Full description

Saved in:

Bibliographic Details
Published in	Statistics and computing Vol. 34; no. 1
Main Authors	Shiu, Shang-Ying, Chin, Yen-Shiu, Lin, Szu-Han, Chen, Ting-Li
Format	Journal Article
Language	English
Published	New York Springer US 01.02.2024 Springer Nature B.V
Subjects	Algorithms Artificial Intelligence Clustering Computational efficiency Computer Science Data points Effectiveness Iterative methods Original Paper Probability and Statistics in Computer Science Statistical Theory and Methods Statistics and Computing/Statistics Programs Randomized algorithm Clustering analysis Large-scale data
Online Access	Get full text
ISSN	0960-3174 1573-1375
DOI	10.1007/s11222-023-10355-8

Cover

More Information
Summary:	This paper introduces the randomized self-updating process (rSUP) algorithm for clustering large-scale data. rSUP is an extension of the self-updating process (SUP) algorithm, which has shown effectiveness in clustering data with characteristics such as noise, varying cluster shapes and sizes, and numerous clusters. However, SUP’s reliance on pairwise dissimilarities between data points makes it computationally inefficient for large-scale data. To address this challenge, rSUP performs location updates within randomly generated data subsets at each iteration. The Law of Large Numbers guarantees that the clustering results of rSUP converge to those of the original SUP as the partition size grows. This paper demonstrates the effectiveness and computational efficiency of rSUP in large-scale data clustering through simulations and real datasets.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0960-3174 1573-1375
DOI:	10.1007/s11222-023-10355-8