Randomized self-updating process for clustering large-scale data

This paper introduces the randomized self-updating process (rSUP) algorithm for clustering large-scale data. rSUP is an extension of the self-updating process (SUP) algorithm, which has shown effectiveness in clustering data with characteristics such as noise, varying cluster shapes and sizes, and n...

Full description

Saved in:
Bibliographic Details
Published inStatistics and computing Vol. 34; no. 1
Main Authors Shiu, Shang-Ying, Chin, Yen-Shiu, Lin, Szu-Han, Chen, Ting-Li
Format Journal Article
LanguageEnglish
Published New York Springer US 01.02.2024
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN0960-3174
1573-1375
DOI10.1007/s11222-023-10355-8

Cover

Abstract This paper introduces the randomized self-updating process (rSUP) algorithm for clustering large-scale data. rSUP is an extension of the self-updating process (SUP) algorithm, which has shown effectiveness in clustering data with characteristics such as noise, varying cluster shapes and sizes, and numerous clusters. However, SUP’s reliance on pairwise dissimilarities between data points makes it computationally inefficient for large-scale data. To address this challenge, rSUP performs location updates within randomly generated data subsets at each iteration. The Law of Large Numbers guarantees that the clustering results of rSUP converge to those of the original SUP as the partition size grows. This paper demonstrates the effectiveness and computational efficiency of rSUP in large-scale data clustering through simulations and real datasets.
AbstractList This paper introduces the randomized self-updating process (rSUP) algorithm for clustering large-scale data. rSUP is an extension of the self-updating process (SUP) algorithm, which has shown effectiveness in clustering data with characteristics such as noise, varying cluster shapes and sizes, and numerous clusters. However, SUP’s reliance on pairwise dissimilarities between data points makes it computationally inefficient for large-scale data. To address this challenge, rSUP performs location updates within randomly generated data subsets at each iteration. The Law of Large Numbers guarantees that the clustering results of rSUP converge to those of the original SUP as the partition size grows. This paper demonstrates the effectiveness and computational efficiency of rSUP in large-scale data clustering through simulations and real datasets.
ArticleNumber 47
Author Chin, Yen-Shiu
Shiu, Shang-Ying
Lin, Szu-Han
Chen, Ting-Li
Author_xml – sequence: 1
  givenname: Shang-Ying
  surname: Shiu
  fullname: Shiu, Shang-Ying
  organization: Department of Statistics, National Taipei University
– sequence: 2
  givenname: Yen-Shiu
  surname: Chin
  fullname: Chin, Yen-Shiu
  organization: Institute of Statistics, National Tsing Hua University
– sequence: 3
  givenname: Szu-Han
  surname: Lin
  fullname: Lin, Szu-Han
  organization: Institute of Statistical Science, Academia Sinica
– sequence: 4
  givenname: Ting-Li
  surname: Chen
  fullname: Chen, Ting-Li
  email: tlchen@stat.sinica.edu.tw
  organization: Institute of Statistical Science, Academia Sinica
BookMark eNp9kE9LxDAQxYOs4O7qF_BU8BydJE3T3pTFf7AgiJ5DmkyWXbptTdqDfnpTK3jzNMzw3puZ34os2q5FQi4ZXDMAdRMZ45xT4IIyEFLS8oQsmVSpFUouyBKqAqhgKj8jqxgPAIwVIl-S21fTuu64_0KXRWw8HXtnhn27y_rQWYwx813IbDPGAcM0bkzYIY3WNJglpTknp940ES9-65q8P9y_bZ7o9uXxeXO3pZYrGGhduNLyunaiRuW5VMw4aS3mHMH4ilVS8QIl1rkqnDOurkowpam4Kox0HsSaXM256a6PEeOgD90Y2rRS87ISuRCikEnFZ5UNXYwBve7D_mjCp2agJ1J6JqUTKf1DSpfJJGZT7KcXMfxF_-P6BoD4bf4
Cites_doi 10.1016/j.ins.2016.05.003
10.1145/276305.276312
10.1016/j.is.2021.101804
10.1007/s10700-018-9290-7
10.1145/1576246.1531327
10.1080/02664763.2012.706268
10.1145/1963405.1963487
10.1080/00949655.2014.949715
10.1080/17445760.2018.1446210
10.1109/TKDE.2002.1033770
10.1080/00949655.2015.1049605
10.1109/TNN.2007.901277
10.1016/j.ins.2022.11.139
10.1109/5.726791
10.1016/j.engappai.2022.104743
10.1145/1835804.1835882
10.18637/jss.v091.i01
10.1007/s10463-013-0443-8
10.1109/PDCAT.2009.46
10.1016/j.patcog.2022.109238
10.1126/science.1242072
10.1016/j.softx.2021.100722
10.1214/13-AOAS680
ContentType Journal Article
Copyright The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Copyright_xml – notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
DBID AAYXX
CITATION
JQ2
DOI 10.1007/s11222-023-10355-8
DatabaseName CrossRef
ProQuest Computer Science Collection
DatabaseTitle CrossRef
ProQuest Computer Science Collection
DatabaseTitleList ProQuest Computer Science Collection

DeliveryMethod fulltext_linktorsrc
Discipline Statistics
Mathematics
Computer Science
EISSN 1573-1375
ExternalDocumentID 10_1007_s11222_023_10355_8
GrantInformation_xml – fundername: Ministry of Science and Technology
  grantid: MOST 105-2118-M-305-003
GroupedDBID -52
-5D
-5G
-BR
-EM
-Y2
-~C
.86
.DC
.VR
06D
0R~
0VY
123
199
1N0
1SB
2.D
203
28-
29Q
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
4.4
406
408
409
40D
40E
5QI
5VS
67Z
6NX
78A
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDZT
ABECU
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABLJU
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACSNA
ACZOJ
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHSBF
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
B-.
BA0
BAPOH
BBWZM
BDATZ
BGNMA
BSONS
CAG
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
EBLON
EBS
EIOEI
EJD
ESBYG
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ6
GQ7
GQ8
GXS
H13
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
KDC
KOV
KOW
LAK
LLZTM
M4Y
MA-
N2Q
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P2P
P9R
PF0
PT4
PT5
QOK
QOS
R4E
R89
R9I
RHV
RIG
RNI
RNS
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SDD
SDH
SDM
SHX
SISQX
SJYHP
SMT
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TEORI
TN5
TSG
TSK
TSV
TUC
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
WK8
YLTOR
Z45
Z7R
Z7U
Z7W
Z7X
Z7Y
Z81
Z83
Z87
Z88
Z8O
Z8R
Z8U
Z8W
Z91
Z92
ZMTXR
ZWQNP
~EX
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ABRTQ
ACSTC
ADHKG
ADKFA
AEZWR
AFDZB
AFHIU
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ATHPR
AYFIA
CITATION
JQ2
ID FETCH-LOGICAL-c270t-b6d8c2bbd3be7f2571ad5cce42e0af9195726e5eb476ddadb980a8a9276a5df03
IEDL.DBID U2A
ISSN 0960-3174
IngestDate Thu Oct 02 15:02:52 EDT 2025
Wed Oct 01 02:57:24 EDT 2025
Fri Feb 21 02:41:38 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords Randomized algorithm
Clustering analysis
Large-scale data
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c270t-b6d8c2bbd3be7f2571ad5cce42e0af9195726e5eb476ddadb980a8a9276a5df03
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
PQID 2893433365
PQPubID 2043829
ParticipantIDs proquest_journals_2893433365
crossref_primary_10_1007_s11222_023_10355_8
springer_journals_10_1007_s11222_023_10355_8
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-02-01
PublicationDateYYYYMMDD 2024-02-01
PublicationDate_xml – month: 02
  year: 2024
  text: 2024-02-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
– name: Dordrecht
PublicationTitle Statistics and computing
PublicationTitleAbbrev Stat Comput
PublicationYear 2024
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References Lin, S.H., Chen, T.L., Tu, I.P.: Distributed t\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t$$\end{document}-sne. (manuscript) (2023)
VovanTAn improved fuzzy time series forecasting model using variations of dataFuzzy Optim. Decis. Mak.2019182151173394905110.1007/s10700-018-9290-7
GagolewskiMBartoszukMCenaAGenie: a new, fast, and outlier-resistant hierarchical clustering algorithmInf. Sci.201636382310.1016/j.ins.2016.05.003
EzugwuAEIkotunAMOyeladeOOAbualigahLAgushakaJOEkeCIAkinyeluAAA comprehensive survey of clustering algorithms: state-of-the-art machine learning applications, taxonomy, challenges, and future research prospectsEng. Appl. Artif. Intell.2022110104,74310.1016/j.engappai.2022.104743
KaufmanLRousseeuwPJFinding Groups in Data: An Introduction to Cluster Analysis2009LondonJohn Wiley & Sons
Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J Mach. Learn. Res. 9(11) (2008)
Hahsler, M., Piekenbrock, M.: dbscan: Density-based spatial clustering of applications with noise (DBSCAN) and related algorithms (2022). https://CRAN.R-project.org/package=dbscan. R package version 1.1-10
Dong, W., Moses, C., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th International Conference on World Wide Web, pp. 577–586 (2011)
ChenTLHsiehDNHungHTuIPWuPSWuYMChangWHHuangSYγ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document}-SUP: a clustering algorithm for cryo-electron microscopy images of asymmetric particlesAnn. Appl. Stat.201481259285319199010.1214/13-AOAS680
SchubertERousseeuwPJFast and eager k-medoids clustering: O (k) runtime improvement of the PAM, CLARA, and CLARANS algorithmsInf. Syst.2021101101,80410.1016/j.is.2021.101804
DingSLiCXuXDingLZhangJGuoLShiTA sampling-based density peaks clustering algorithm for large-scale dataPattern Recogn.2023136109,23810.1016/j.patcog.2022.109238
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K.: cluster: Cluster analysis basics and extensions (2022). https://CRAN.R-project.org/package=cluster. R package version 2.1.3
March, W.B., Ram, P., Gray, A.G.: Fast Euclidean minimum spanning tree: algorithm, analysis, and applications. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 603–612 (2010)
ShiuSYChenTLOn the strengths of the self-updating process clustering algorithmJ. Stat. Comput. Simul.201686510101031343707210.1080/00949655.2015.1049605
Ikotun, A.M., Ezugwu, A.E., Abualigah, L., Abuhaija, B., Heming, J.: K-means clustering algorithms: a comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. (2022)
LuHPlataniotisKNVenetsanopoulosANMPCA: Multilinear principal component analysis of tensor objectsIEEE Trans. Neural Netw.2008191183910.1109/TNN.2007.901277
Wu, W., Shiu, S.Y.: supc: The self-updating process clustering algorithms (2021). https://CRAN.R-project.org/package=supc. R package version 0.2.6.2
ChenTLOn the convergence and consistency of the blurring mean-shift processAnn. Inst. Stat. Math.2015671157176329786210.1007/s10463-013-0443-8
Barton, T.: Clustering benchmarks (2015). https://github.com/deric/clustering-benchmark
HungWLChang-ChienSJYangMSSelf-updating clustering algorithm for estimating the parameters in mixtures of von Mises distributionsJ. Appl. Stat.2012391022592274296802310.1080/02664763.2012.706268
Chen, T.L., Shiu, S.Y.: A clustering algorithm by self-updating process. In: JSM Proceedings, Statistical Computing Section, Salt Lake City, Utah, pp. 2034–2038 (2007)
GuhaSRastogiRShimKCure: an efficient clustering algorithm for large databasesACM SIGMOD Rec.1998272738410.1145/276305.276312
LeCunYBottouLBengioYHaffnerPGradient-based learning applied to document recognitionProc. IEEE199886112278232410.1109/5.726791
Sun, T., Shu, C., Li, F., Yu, H., Ma, L., Fang, Y.: An efficient hierarchical clustering method for large datasets with map-reduce. In: 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 494–499. IEEE (2009)
BendechacheMTariAKKechadiMTParallel and distributed clustering framework for big spatial data miningInt. J. Parallel Emergent Distrib. Syst.201934667168910.1080/17445760.2018.1446210
GagolewskiMgenieclust: Fast and robust hierarchical clusteringSoftwareX202115100,72210.1016/j.softx.2021.100722
ChenJHHungWLAn automatic clustering algorithm for probability density functionsJ. Stat. Comput. Simul.2015851530473063336960410.1080/00949655.2014.949715
RodriguezALaioAClustering by fast search and find of density peaksScience201434461911492149610.1126/science.1242072
HahslerMPiekenbrockMDoranDdbscan: Fast density-based clustering with rJ. Stat. Softw.20199113010.18637/jss.v091.i01
Adams, A., Gelfand, N., Dolson, J., Levoy, M.: Gaussian kd-trees for fast high-dimensional filtering. In: ACM SIGGRAPH 2009 papers, pp. 1–12 (2009)
NgRTHanJClarans: a method for clustering objects for spatial data miningIEEE Trans. Knowl. Data Eng.20021451003101610.1109/TKDE.2002.1033770
10355_CR1
M Bendechache (10355_CR3) 2019; 34
10355_CR9
S Guha (10355_CR13) 1998; 27
10355_CR6
10355_CR20
10355_CR22
10355_CR2
10355_CR23
10355_CR17
WL Hung (10355_CR16) 2012; 39
TL Chen (10355_CR4) 2015; 67
M Hahsler (10355_CR15) 2019; 91
T Vovan (10355_CR30) 2019; 18
SY Shiu (10355_CR27) 2016; 86
JH Chen (10355_CR5) 2015; 85
A Rodriguez (10355_CR25) 2014; 344
S Ding (10355_CR8) 2023; 136
M Gagolewski (10355_CR12) 2016; 363
10355_CR14
L Kaufman (10355_CR18) 2009
10355_CR31
Y LeCun (10355_CR19) 1998; 86
RT Ng (10355_CR24) 2002; 14
10355_CR28
10355_CR29
AE Ezugwu (10355_CR10) 2022; 110
E Schubert (10355_CR26) 2021; 101
H Lu (10355_CR21) 2008; 19
M Gagolewski (10355_CR11) 2021; 15
TL Chen (10355_CR7) 2014; 8
References_xml – reference: EzugwuAEIkotunAMOyeladeOOAbualigahLAgushakaJOEkeCIAkinyeluAAA comprehensive survey of clustering algorithms: state-of-the-art machine learning applications, taxonomy, challenges, and future research prospectsEng. Appl. Artif. Intell.2022110104,74310.1016/j.engappai.2022.104743
– reference: GagolewskiMgenieclust: Fast and robust hierarchical clusteringSoftwareX202115100,72210.1016/j.softx.2021.100722
– reference: Wu, W., Shiu, S.Y.: supc: The self-updating process clustering algorithms (2021). https://CRAN.R-project.org/package=supc. R package version 0.2.6.2
– reference: ChenTLOn the convergence and consistency of the blurring mean-shift processAnn. Inst. Stat. Math.2015671157176329786210.1007/s10463-013-0443-8
– reference: Lin, S.H., Chen, T.L., Tu, I.P.: Distributed t\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t$$\end{document}-sne. (manuscript) (2023)
– reference: VovanTAn improved fuzzy time series forecasting model using variations of dataFuzzy Optim. Decis. Mak.2019182151173394905110.1007/s10700-018-9290-7
– reference: ChenJHHungWLAn automatic clustering algorithm for probability density functionsJ. Stat. Comput. Simul.2015851530473063336960410.1080/00949655.2014.949715
– reference: RodriguezALaioAClustering by fast search and find of density peaksScience201434461911492149610.1126/science.1242072
– reference: BendechacheMTariAKKechadiMTParallel and distributed clustering framework for big spatial data miningInt. J. Parallel Emergent Distrib. Syst.201934667168910.1080/17445760.2018.1446210
– reference: Barton, T.: Clustering benchmarks (2015). https://github.com/deric/clustering-benchmark
– reference: HahslerMPiekenbrockMDoranDdbscan: Fast density-based clustering with rJ. Stat. Softw.20199113010.18637/jss.v091.i01
– reference: Adams, A., Gelfand, N., Dolson, J., Levoy, M.: Gaussian kd-trees for fast high-dimensional filtering. In: ACM SIGGRAPH 2009 papers, pp. 1–12 (2009)
– reference: Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K.: cluster: Cluster analysis basics and extensions (2022). https://CRAN.R-project.org/package=cluster. R package version 2.1.3
– reference: ShiuSYChenTLOn the strengths of the self-updating process clustering algorithmJ. Stat. Comput. Simul.201686510101031343707210.1080/00949655.2015.1049605
– reference: GagolewskiMBartoszukMCenaAGenie: a new, fast, and outlier-resistant hierarchical clustering algorithmInf. Sci.201636382310.1016/j.ins.2016.05.003
– reference: HungWLChang-ChienSJYangMSSelf-updating clustering algorithm for estimating the parameters in mixtures of von Mises distributionsJ. Appl. Stat.2012391022592274296802310.1080/02664763.2012.706268
– reference: March, W.B., Ram, P., Gray, A.G.: Fast Euclidean minimum spanning tree: algorithm, analysis, and applications. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 603–612 (2010)
– reference: KaufmanLRousseeuwPJFinding Groups in Data: An Introduction to Cluster Analysis2009LondonJohn Wiley & Sons
– reference: Sun, T., Shu, C., Li, F., Yu, H., Ma, L., Fang, Y.: An efficient hierarchical clustering method for large datasets with map-reduce. In: 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 494–499. IEEE (2009)
– reference: SchubertERousseeuwPJFast and eager k-medoids clustering: O (k) runtime improvement of the PAM, CLARA, and CLARANS algorithmsInf. Syst.2021101101,80410.1016/j.is.2021.101804
– reference: DingSLiCXuXDingLZhangJGuoLShiTA sampling-based density peaks clustering algorithm for large-scale dataPattern Recogn.2023136109,23810.1016/j.patcog.2022.109238
– reference: Hahsler, M., Piekenbrock, M.: dbscan: Density-based spatial clustering of applications with noise (DBSCAN) and related algorithms (2022). https://CRAN.R-project.org/package=dbscan. R package version 1.1-10
– reference: NgRTHanJClarans: a method for clustering objects for spatial data miningIEEE Trans. Knowl. Data Eng.20021451003101610.1109/TKDE.2002.1033770
– reference: Ikotun, A.M., Ezugwu, A.E., Abualigah, L., Abuhaija, B., Heming, J.: K-means clustering algorithms: a comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. (2022)
– reference: LuHPlataniotisKNVenetsanopoulosANMPCA: Multilinear principal component analysis of tensor objectsIEEE Trans. Neural Netw.2008191183910.1109/TNN.2007.901277
– reference: Dong, W., Moses, C., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th International Conference on World Wide Web, pp. 577–586 (2011)
– reference: LeCunYBottouLBengioYHaffnerPGradient-based learning applied to document recognitionProc. IEEE199886112278232410.1109/5.726791
– reference: ChenTLHsiehDNHungHTuIPWuPSWuYMChangWHHuangSYγ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document}-SUP: a clustering algorithm for cryo-electron microscopy images of asymmetric particlesAnn. Appl. Stat.201481259285319199010.1214/13-AOAS680
– reference: GuhaSRastogiRShimKCure: an efficient clustering algorithm for large databasesACM SIGMOD Rec.1998272738410.1145/276305.276312
– reference: Chen, T.L., Shiu, S.Y.: A clustering algorithm by self-updating process. In: JSM Proceedings, Statistical Computing Section, Salt Lake City, Utah, pp. 2034–2038 (2007)
– reference: Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J Mach. Learn. Res. 9(11) (2008)
– ident: 10355_CR31
– volume: 363
  start-page: 8
  year: 2016
  ident: 10355_CR12
  publication-title: Inf. Sci.
  doi: 10.1016/j.ins.2016.05.003
– volume: 27
  start-page: 73
  issue: 2
  year: 1998
  ident: 10355_CR13
  publication-title: ACM SIGMOD Rec.
  doi: 10.1145/276305.276312
– ident: 10355_CR29
– volume: 101
  start-page: 101,804
  year: 2021
  ident: 10355_CR26
  publication-title: Inf. Syst.
  doi: 10.1016/j.is.2021.101804
– volume: 18
  start-page: 151
  issue: 2
  year: 2019
  ident: 10355_CR30
  publication-title: Fuzzy Optim. Decis. Mak.
  doi: 10.1007/s10700-018-9290-7
– ident: 10355_CR1
  doi: 10.1145/1576246.1531327
– ident: 10355_CR14
– ident: 10355_CR6
– volume-title: Finding Groups in Data: An Introduction to Cluster Analysis
  year: 2009
  ident: 10355_CR18
– ident: 10355_CR2
– volume: 39
  start-page: 2259
  issue: 10
  year: 2012
  ident: 10355_CR16
  publication-title: J. Appl. Stat.
  doi: 10.1080/02664763.2012.706268
– ident: 10355_CR9
  doi: 10.1145/1963405.1963487
– volume: 85
  start-page: 3047
  issue: 15
  year: 2015
  ident: 10355_CR5
  publication-title: J. Stat. Comput. Simul.
  doi: 10.1080/00949655.2014.949715
– volume: 34
  start-page: 671
  issue: 6
  year: 2019
  ident: 10355_CR3
  publication-title: Int. J. Parallel Emergent Distrib. Syst.
  doi: 10.1080/17445760.2018.1446210
– volume: 14
  start-page: 1003
  issue: 5
  year: 2002
  ident: 10355_CR24
  publication-title: IEEE Trans. Knowl. Data Eng.
  doi: 10.1109/TKDE.2002.1033770
– volume: 86
  start-page: 1010
  issue: 5
  year: 2016
  ident: 10355_CR27
  publication-title: J. Stat. Comput. Simul.
  doi: 10.1080/00949655.2015.1049605
– volume: 19
  start-page: 18
  issue: 1
  year: 2008
  ident: 10355_CR21
  publication-title: IEEE Trans. Neural Netw.
  doi: 10.1109/TNN.2007.901277
– ident: 10355_CR17
  doi: 10.1016/j.ins.2022.11.139
– volume: 86
  start-page: 2278
  issue: 11
  year: 1998
  ident: 10355_CR19
  publication-title: Proc. IEEE
  doi: 10.1109/5.726791
– volume: 110
  start-page: 104,743
  year: 2022
  ident: 10355_CR10
  publication-title: Eng. Appl. Artif. Intell.
  doi: 10.1016/j.engappai.2022.104743
– ident: 10355_CR23
  doi: 10.1145/1835804.1835882
– volume: 91
  start-page: 1
  year: 2019
  ident: 10355_CR15
  publication-title: J. Stat. Softw.
  doi: 10.18637/jss.v091.i01
– ident: 10355_CR20
– volume: 67
  start-page: 157
  issue: 1
  year: 2015
  ident: 10355_CR4
  publication-title: Ann. Inst. Stat. Math.
  doi: 10.1007/s10463-013-0443-8
– ident: 10355_CR28
  doi: 10.1109/PDCAT.2009.46
– ident: 10355_CR22
– volume: 136
  start-page: 109,238
  year: 2023
  ident: 10355_CR8
  publication-title: Pattern Recogn.
  doi: 10.1016/j.patcog.2022.109238
– volume: 344
  start-page: 1492
  issue: 6191
  year: 2014
  ident: 10355_CR25
  publication-title: Science
  doi: 10.1126/science.1242072
– volume: 15
  start-page: 100,722
  year: 2021
  ident: 10355_CR11
  publication-title: SoftwareX
  doi: 10.1016/j.softx.2021.100722
– volume: 8
  start-page: 259
  issue: 1
  year: 2014
  ident: 10355_CR7
  publication-title: Ann. Appl. Stat.
  doi: 10.1214/13-AOAS680
SSID ssj0011634
Score 2.3576229
Snippet This paper introduces the randomized self-updating process (rSUP) algorithm for clustering large-scale data. rSUP is an extension of the self-updating process...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Index Database
Publisher
SubjectTerms Algorithms
Artificial Intelligence
Clustering
Computational efficiency
Computer Science
Data points
Effectiveness
Iterative methods
Original Paper
Probability and Statistics in Computer Science
Statistical Theory and Methods
Statistics and Computing/Statistics Programs
Title Randomized self-updating process for clustering large-scale data
URI https://link.springer.com/article/10.1007/s11222-023-10355-8
https://www.proquest.com/docview/2893433365
Volume 34
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVLSH
  databaseName: SpringerLink Journals
  customDbUrl:
  mediaType: online
  eissn: 1573-1375
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0011634
  issn: 0960-3174
  databaseCode: AFBBN
  dateStart: 19910901
  isFulltext: true
  providerName: Library Specific Holdings
– providerCode: PRVAVX
  databaseName: SpringerLINK - Czech Republic Consortium
  customDbUrl:
  eissn: 1573-1375
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0011634
  issn: 0960-3174
  databaseCode: AGYKE
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: http://link.springer.com
  providerName: Springer Nature
– providerCode: PRVAVX
  databaseName: SpringerLink Journals (ICM)
  customDbUrl:
  eissn: 1573-1375
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0011634
  issn: 0960-3174
  databaseCode: U2A
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: http://www.springerlink.com/journals/
  providerName: Springer Nature
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFH7odpkHf0zF6Rw9eNNAmzRJe3PIdCjzIA7mqSRNIsLsht0u_vUm_bGi6MFrWxJ4yXvfV_K-LwAXxDeSiVAjzZiPQoNDJH3jxO4W_GWkY0Oc3nnyyMbT8H5GZ5UoLK-73esjyaJSN2K3wGIZshhjS4dFSRRtQ5s6Oy-7i6d4uDk7sJMUplGWm9sKw8NKKvP7GN_hqOGYP45FC7S53YfdiiZ6w3JdD2BLZ13Yq69g8KqM7MLOZGO7mneh46hj6bx8CNdPIlOL97dPrbxczw1aL52UIXv1lqU4wLN81Uvna2eV4B7PXVM4yu2iac81jh7B9Hb0fDNG1X0JKMXcXyHJVJRiKRWRmhubi4FQNE11iLUvTBzElGOmqZYhZ0oJJePIF5GIMWeCKuOTY2hli0yfgEdwqoSzwnJmMAHmQgaSUC1oQCjXxPTgsg5bsixtMZLGANkFObFBToogJ1EP-nVkkypF8sT-6ZGQEMJoD67qaDev_x7t9H-fn0EHWyJSdlr3obX6WOtzSyRWcgDt4d3Lw2hQ7J8vojO__w
linkProvider Springer Nature
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED1BGYCBQgFRKJCBDYwSfyXZQAgoH-2AWqlMkR3bCFHSirQLvx67SaioYGBNIie58909y_eeAU6IbyQXVCPNuY-owRRJ3ziyuy3-MtKxIY7v3Onydp_eD9igJIXlVbd7tSU5y9RzsltgaxmyNcamDlslUbQMK9QuUHANVi5vnx-uv3cP7GtmslEWndscE9KSLPP7KD8L0hxlLmyMzurNTR361ZcWbSZv59OJPE8_F0Qc__srm7BRAlDvspgxW7CkswbUq8MdvDLWG7De-RZ0zRuw5kBpoem8DRdPIlOj99dPrbxcDw2ajh1JInvxxgXtwLNI2EuHUyfC4C4PXbs5yu100J5rSd2B_s1176qNypMYUIpDf4IkV1GKpVRE6tDYKA-EYmmqKda-MHEQsxBzzbSkIVdKKBlHvohEjEMumDI-2YVaNsr0HngEp0o4kS0nMxPgUMhAEqYFCwgLNTFNOK3ckYwLwY1kLq3s7JZYuyUzuyVRE1qVx5Iy-PLEriEJJYRw1oSzygHz23-Ptv-_x49htd3rPCaPd92HA1jDFu4U_dwtqE0-pvrQwpWJPCpn5xf5cN5C
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwEB2xSKgcWAqIQoEcuIHVxFuSGwio2FohRKXeIju2EVJJK9Je-HrsLBQQHLgmkS09ezwvmnnPAMfEN5ILqpHm3EfUYIqkb5zY3SZ_GenYEKd37vX59YDeDtnwi4q_6HavS5KlpsG5NGXTzkSZzlz4Fti8hmy-sceIzZgoWoRl6owS7I4e4PPPOoKdsDCQsjzdnjYhrWQzv4_xPTXN-eaPEmmRebobsFZRRu-8XONNWNBZE9br6xi8KjqbsNr7tGDNm9BwNLJ0Yd6Cs0eRqfHry7tWXq5HBs0mTtaQPXuTUijgWe7qpaOZs01wj0euQRzldgG155pIt2HQvXq6uEbV3QkoxaE_RZKrKMVSKiJ1aGxcBkKxNNUUa1-YOIhZiLlmWtKQKyWUjCNfRCLGIRdMGZ_swFI2zvQueASnSjhbLGcME-BQyEASpgULCAs1MS04qWFLJqVFRjI3Q3YgJxbkpAA5iVrQrpFNqnDJE_vXRyghhLMWnNZoz1__Pdre_z4_gpWHy25yf9O_24cGtvykbMBuw9L0baYPLL-YysNiC30APsnFkw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Randomized+self-updating+process+for+clustering+large-scale+data&rft.jtitle=Statistics+and+computing&rft.au=Shang-Ying%2C+Shiu&rft.au=Yen-Shiu%2C+Chin&rft.au=Szu-Han%2C+Lin&rft.au=Ting-Li%2C+Chen&rft.date=2024-02-01&rft.pub=Springer+Nature+B.V&rft.issn=0960-3174&rft.eissn=1573-1375&rft.volume=34&rft.issue=1&rft_id=info:doi/10.1007%2Fs11222-023-10355-8&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0960-3174&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0960-3174&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0960-3174&client=summon