Randomized self-updating process for clustering large-scale data

This paper introduces the randomized self-updating process (rSUP) algorithm for clustering large-scale data. rSUP is an extension of the self-updating process (SUP) algorithm, which has shown effectiveness in clustering data with characteristics such as noise, varying cluster shapes and sizes, and n...

Full description

Saved in:

Bibliographic Details
Published in	Statistics and computing Vol. 34; no. 1
Main Authors	Shiu, Shang-Ying, Chin, Yen-Shiu, Lin, Szu-Han, Chen, Ting-Li
Format	Journal Article
Language	English
Published	New York Springer US 01.02.2024 Springer Nature B.V
Subjects	Algorithms Artificial Intelligence Clustering Computational efficiency Computer Science Data points Effectiveness Iterative methods Original Paper Probability and Statistics in Computer Science Statistical Theory and Methods Statistics and Computing/Statistics Programs Randomized algorithm Clustering analysis Large-scale data
Online Access	Get full text
ISSN	0960-3174 1573-1375
DOI	10.1007/s11222-023-10355-8

Cover

Abstract	This paper introduces the randomized self-updating process (rSUP) algorithm for clustering large-scale data. rSUP is an extension of the self-updating process (SUP) algorithm, which has shown effectiveness in clustering data with characteristics such as noise, varying cluster shapes and sizes, and numerous clusters. However, SUP’s reliance on pairwise dissimilarities between data points makes it computationally inefficient for large-scale data. To address this challenge, rSUP performs location updates within randomly generated data subsets at each iteration. The Law of Large Numbers guarantees that the clustering results of rSUP converge to those of the original SUP as the partition size grows. This paper demonstrates the effectiveness and computational efficiency of rSUP in large-scale data clustering through simulations and real datasets.
AbstractList	This paper introduces the randomized self-updating process (rSUP) algorithm for clustering large-scale data. rSUP is an extension of the self-updating process (SUP) algorithm, which has shown effectiveness in clustering data with characteristics such as noise, varying cluster shapes and sizes, and numerous clusters. However, SUP’s reliance on pairwise dissimilarities between data points makes it computationally inefficient for large-scale data. To address this challenge, rSUP performs location updates within randomly generated data subsets at each iteration. The Law of Large Numbers guarantees that the clustering results of rSUP converge to those of the original SUP as the partition size grows. This paper demonstrates the effectiveness and computational efficiency of rSUP in large-scale data clustering through simulations and real datasets.
ArticleNumber	47
Author	Chin, Yen-Shiu Shiu, Shang-Ying Lin, Szu-Han Chen, Ting-Li
Author_xml	– sequence: 1 givenname: Shang-Ying surname: Shiu fullname: Shiu, Shang-Ying organization: Department of Statistics, National Taipei University – sequence: 2 givenname: Yen-Shiu surname: Chin fullname: Chin, Yen-Shiu organization: Institute of Statistics, National Tsing Hua University – sequence: 3 givenname: Szu-Han surname: Lin fullname: Lin, Szu-Han organization: Institute of Statistical Science, Academia Sinica – sequence: 4 givenname: Ting-Li surname: Chen fullname: Chen, Ting-Li email: tlchen@stat.sinica.edu.tw organization: Institute of Statistical Science, Academia Sinica
BookMark	eNp9kE9LxDAQxYOs4O7qF_BU8BydJE3T3pTFf7AgiJ5DmkyWXbptTdqDfnpTK3jzNMzw3puZ34os2q5FQi4ZXDMAdRMZ45xT4IIyEFLS8oQsmVSpFUouyBKqAqhgKj8jqxgPAIwVIl-S21fTuu64_0KXRWw8HXtnhn27y_rQWYwx813IbDPGAcM0bkzYIY3WNJglpTknp940ES9-65q8P9y_bZ7o9uXxeXO3pZYrGGhduNLyunaiRuW5VMw4aS3mHMH4ilVS8QIl1rkqnDOurkowpam4Kox0HsSaXM256a6PEeOgD90Y2rRS87ISuRCikEnFZ5UNXYwBve7D_mjCp2agJ1J6JqUTKf1DSpfJJGZT7KcXMfxF_-P6BoD4bf4
Cites_doi	10.1016/j.ins.2016.05.003 10.1145/276305.276312 10.1016/j.is.2021.101804 10.1007/s10700-018-9290-7 10.1145/1576246.1531327 10.1080/02664763.2012.706268 10.1145/1963405.1963487 10.1080/00949655.2014.949715 10.1080/17445760.2018.1446210 10.1109/TKDE.2002.1033770 10.1080/00949655.2015.1049605 10.1109/TNN.2007.901277 10.1016/j.ins.2022.11.139 10.1109/5.726791 10.1016/j.engappai.2022.104743 10.1145/1835804.1835882 10.18637/jss.v091.i01 10.1007/s10463-013-0443-8 10.1109/PDCAT.2009.46 10.1016/j.patcog.2022.109238 10.1126/science.1242072 10.1016/j.softx.2021.100722 10.1214/13-AOAS680
ContentType	Journal Article
Copyright	The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Copyright_xml	– notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
DBID	AAYXX CITATION JQ2
DOI	10.1007/s11222-023-10355-8
DatabaseName	CrossRef ProQuest Computer Science Collection
DatabaseTitle	CrossRef ProQuest Computer Science Collection
DatabaseTitleList	ProQuest Computer Science Collection
DeliveryMethod	fulltext_linktorsrc
Discipline	Statistics Mathematics Computer Science
EISSN	1573-1375
ExternalDocumentID	10_1007_s11222_023_10355_8
GrantInformation_xml	– fundername: Ministry of Science and Technology grantid: MOST 105-2118-M-305-003
GroupedDBID	-52 -5D -5G -BR -EM -Y2 -~C .86 .DC .VR 06D 0R~ 0VY 123 199 1N0 1SB 2.D 203 28- 29Q 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 4.4 406 408 409 40D 40E 5QI 5VS 67Z 6NX 78A 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABLJU ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACSNA ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. BA0 BAPOH BBWZM BDATZ BGNMA BSONS CAG COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 EBLON EBS EIOEI EJD ESBYG F5P FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GXS H13 HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV KOW LAK LLZTM M4Y MA- N2Q NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P2P P9R PF0 PT4 PT5 QOK QOS R4E R89 R9I RHV RIG RNI RNS ROL RPX RSV RZC RZE RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SDD SDH SDM SHX SISQX SJYHP SMT SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TEORI TN5 TSG TSK TSV TUC U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 WK8 YLTOR Z45 Z7R Z7U Z7W Z7X Z7Y Z81 Z83 Z87 Z88 Z8O Z8R Z8U Z8W Z91 Z92 ZMTXR ZWQNP ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ABRTQ ACSTC ADHKG ADKFA AEZWR AFDZB AFHIU AFOHR AGQPQ AHPBZ AHWEU AIXLP ATHPR AYFIA CITATION JQ2
ID	FETCH-LOGICAL-c270t-b6d8c2bbd3be7f2571ad5cce42e0af9195726e5eb476ddadb980a8a9276a5df03
IEDL.DBID	U2A
ISSN	0960-3174
IngestDate	Thu Oct 02 15:02:52 EDT 2025 Wed Oct 01 02:57:24 EDT 2025 Fri Feb 21 02:41:38 EST 2025
IsPeerReviewed	true
IsScholarly	true
Issue	1
Keywords	Randomized algorithm Clustering analysis Large-scale data
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c270t-b6d8c2bbd3be7f2571ad5cce42e0af9195726e5eb476ddadb980a8a9276a5df03
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
PQID	2893433365
PQPubID	2043829
ParticipantIDs	proquest_journals_2893433365 crossref_primary_10_1007_s11222_023_10355_8 springer_journals_10_1007_s11222_023_10355_8
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2024-02-01
PublicationDateYYYYMMDD	2024-02-01
PublicationDate_xml	– month: 02 year: 2024 text: 2024-02-01 day: 01
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York – name: Dordrecht
PublicationTitle	Statistics and computing
PublicationTitleAbbrev	Stat Comput
PublicationYear	2024
Publisher	Springer US Springer Nature B.V
Publisher_xml	– name: Springer US – name: Springer Nature B.V
References	Lin, S.H., Chen, T.L., Tu, I.P.: Distributed t\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t$$\end{document}-sne. (manuscript) (2023) VovanTAn improved fuzzy time series forecasting model using variations of dataFuzzy Optim. Decis. Mak.2019182151173394905110.1007/s10700-018-9290-7 GagolewskiMBartoszukMCenaAGenie: a new, fast, and outlier-resistant hierarchical clustering algorithmInf. Sci.201636382310.1016/j.ins.2016.05.003 EzugwuAEIkotunAMOyeladeOOAbualigahLAgushakaJOEkeCIAkinyeluAAA comprehensive survey of clustering algorithms: state-of-the-art machine learning applications, taxonomy, challenges, and future research prospectsEng. Appl. Artif. Intell.2022110104,74310.1016/j.engappai.2022.104743 KaufmanLRousseeuwPJFinding Groups in Data: An Introduction to Cluster Analysis2009LondonJohn Wiley & Sons Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J Mach. Learn. Res. 9(11) (2008) Hahsler, M., Piekenbrock, M.: dbscan: Density-based spatial clustering of applications with noise (DBSCAN) and related algorithms (2022). https://CRAN.R-project.org/package=dbscan. R package version 1.1-10 Dong, W., Moses, C., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th International Conference on World Wide Web, pp. 577–586 (2011) ChenTLHsiehDNHungHTuIPWuPSWuYMChangWHHuangSYγ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document}-SUP: a clustering algorithm for cryo-electron microscopy images of asymmetric particlesAnn. Appl. Stat.201481259285319199010.1214/13-AOAS680 SchubertERousseeuwPJFast and eager k-medoids clustering: O (k) runtime improvement of the PAM, CLARA, and CLARANS algorithmsInf. Syst.2021101101,80410.1016/j.is.2021.101804 DingSLiCXuXDingLZhangJGuoLShiTA sampling-based density peaks clustering algorithm for large-scale dataPattern Recogn.2023136109,23810.1016/j.patcog.2022.109238 Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K.: cluster: Cluster analysis basics and extensions (2022). https://CRAN.R-project.org/package=cluster. R package version 2.1.3 March, W.B., Ram, P., Gray, A.G.: Fast Euclidean minimum spanning tree: algorithm, analysis, and applications. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 603–612 (2010) ShiuSYChenTLOn the strengths of the self-updating process clustering algorithmJ. Stat. Comput. Simul.201686510101031343707210.1080/00949655.2015.1049605 Ikotun, A.M., Ezugwu, A.E., Abualigah, L., Abuhaija, B., Heming, J.: K-means clustering algorithms: a comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. (2022) LuHPlataniotisKNVenetsanopoulosANMPCA: Multilinear principal component analysis of tensor objectsIEEE Trans. Neural Netw.2008191183910.1109/TNN.2007.901277 Wu, W., Shiu, S.Y.: supc: The self-updating process clustering algorithms (2021). https://CRAN.R-project.org/package=supc. R package version 0.2.6.2 ChenTLOn the convergence and consistency of the blurring mean-shift processAnn. Inst. Stat. Math.2015671157176329786210.1007/s10463-013-0443-8 Barton, T.: Clustering benchmarks (2015). https://github.com/deric/clustering-benchmark HungWLChang-ChienSJYangMSSelf-updating clustering algorithm for estimating the parameters in mixtures of von Mises distributionsJ. Appl. Stat.2012391022592274296802310.1080/02664763.2012.706268 Chen, T.L., Shiu, S.Y.: A clustering algorithm by self-updating process. In: JSM Proceedings, Statistical Computing Section, Salt Lake City, Utah, pp. 2034–2038 (2007) GuhaSRastogiRShimKCure: an efficient clustering algorithm for large databasesACM SIGMOD Rec.1998272738410.1145/276305.276312 LeCunYBottouLBengioYHaffnerPGradient-based learning applied to document recognitionProc. IEEE199886112278232410.1109/5.726791 Sun, T., Shu, C., Li, F., Yu, H., Ma, L., Fang, Y.: An efficient hierarchical clustering method for large datasets with map-reduce. In: 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 494–499. IEEE (2009) BendechacheMTariAKKechadiMTParallel and distributed clustering framework for big spatial data miningInt. J. Parallel Emergent Distrib. Syst.201934667168910.1080/17445760.2018.1446210 GagolewskiMgenieclust: Fast and robust hierarchical clusteringSoftwareX202115100,72210.1016/j.softx.2021.100722 ChenJHHungWLAn automatic clustering algorithm for probability density functionsJ. Stat. Comput. Simul.2015851530473063336960410.1080/00949655.2014.949715 RodriguezALaioAClustering by fast search and find of density peaksScience201434461911492149610.1126/science.1242072 HahslerMPiekenbrockMDoranDdbscan: Fast density-based clustering with rJ. Stat. Softw.20199113010.18637/jss.v091.i01 Adams, A., Gelfand, N., Dolson, J., Levoy, M.: Gaussian kd-trees for fast high-dimensional filtering. In: ACM SIGGRAPH 2009 papers, pp. 1–12 (2009) NgRTHanJClarans: a method for clustering objects for spatial data miningIEEE Trans. Knowl. Data Eng.20021451003101610.1109/TKDE.2002.1033770 10355_CR1 M Bendechache (10355_CR3) 2019; 34 10355_CR9 S Guha (10355_CR13) 1998; 27 10355_CR6 10355_CR20 10355_CR22 10355_CR2 10355_CR23 10355_CR17 WL Hung (10355_CR16) 2012; 39 TL Chen (10355_CR4) 2015; 67 M Hahsler (10355_CR15) 2019; 91 T Vovan (10355_CR30) 2019; 18 SY Shiu (10355_CR27) 2016; 86 JH Chen (10355_CR5) 2015; 85 A Rodriguez (10355_CR25) 2014; 344 S Ding (10355_CR8) 2023; 136 M Gagolewski (10355_CR12) 2016; 363 10355_CR14 L Kaufman (10355_CR18) 2009 10355_CR31 Y LeCun (10355_CR19) 1998; 86 RT Ng (10355_CR24) 2002; 14 10355_CR28 10355_CR29 AE Ezugwu (10355_CR10) 2022; 110 E Schubert (10355_CR26) 2021; 101 H Lu (10355_CR21) 2008; 19 M Gagolewski (10355_CR11) 2021; 15 TL Chen (10355_CR7) 2014; 8
References_xml	– reference: EzugwuAEIkotunAMOyeladeOOAbualigahLAgushakaJOEkeCIAkinyeluAAA comprehensive survey of clustering algorithms: state-of-the-art machine learning applications, taxonomy, challenges, and future research prospectsEng. Appl. Artif. Intell.2022110104,74310.1016/j.engappai.2022.104743 – reference: GagolewskiMgenieclust: Fast and robust hierarchical clusteringSoftwareX202115100,72210.1016/j.softx.2021.100722 – reference: Wu, W., Shiu, S.Y.: supc: The self-updating process clustering algorithms (2021). https://CRAN.R-project.org/package=supc. R package version 0.2.6.2 – reference: ChenTLOn the convergence and consistency of the blurring mean-shift processAnn. Inst. Stat. Math.2015671157176329786210.1007/s10463-013-0443-8 – reference: Lin, S.H., Chen, T.L., Tu, I.P.: Distributed t\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t$$\end{document}-sne. (manuscript) (2023) – reference: VovanTAn improved fuzzy time series forecasting model using variations of dataFuzzy Optim. Decis. Mak.2019182151173394905110.1007/s10700-018-9290-7 – reference: ChenJHHungWLAn automatic clustering algorithm for probability density functionsJ. Stat. Comput. Simul.2015851530473063336960410.1080/00949655.2014.949715 – reference: RodriguezALaioAClustering by fast search and find of density peaksScience201434461911492149610.1126/science.1242072 – reference: BendechacheMTariAKKechadiMTParallel and distributed clustering framework for big spatial data miningInt. J. Parallel Emergent Distrib. Syst.201934667168910.1080/17445760.2018.1446210 – reference: Barton, T.: Clustering benchmarks (2015). https://github.com/deric/clustering-benchmark – reference: HahslerMPiekenbrockMDoranDdbscan: Fast density-based clustering with rJ. Stat. Softw.20199113010.18637/jss.v091.i01 – reference: Adams, A., Gelfand, N., Dolson, J., Levoy, M.: Gaussian kd-trees for fast high-dimensional filtering. In: ACM SIGGRAPH 2009 papers, pp. 1–12 (2009) – reference: Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K.: cluster: Cluster analysis basics and extensions (2022). https://CRAN.R-project.org/package=cluster. R package version 2.1.3 – reference: ShiuSYChenTLOn the strengths of the self-updating process clustering algorithmJ. Stat. Comput. Simul.201686510101031343707210.1080/00949655.2015.1049605 – reference: GagolewskiMBartoszukMCenaAGenie: a new, fast, and outlier-resistant hierarchical clustering algorithmInf. Sci.201636382310.1016/j.ins.2016.05.003 – reference: HungWLChang-ChienSJYangMSSelf-updating clustering algorithm for estimating the parameters in mixtures of von Mises distributionsJ. Appl. Stat.2012391022592274296802310.1080/02664763.2012.706268 – reference: March, W.B., Ram, P., Gray, A.G.: Fast Euclidean minimum spanning tree: algorithm, analysis, and applications. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 603–612 (2010) – reference: KaufmanLRousseeuwPJFinding Groups in Data: An Introduction to Cluster Analysis2009LondonJohn Wiley & Sons – reference: Sun, T., Shu, C., Li, F., Yu, H., Ma, L., Fang, Y.: An efficient hierarchical clustering method for large datasets with map-reduce. In: 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 494–499. IEEE (2009) – reference: SchubertERousseeuwPJFast and eager k-medoids clustering: O (k) runtime improvement of the PAM, CLARA, and CLARANS algorithmsInf. Syst.2021101101,80410.1016/j.is.2021.101804 – reference: DingSLiCXuXDingLZhangJGuoLShiTA sampling-based density peaks clustering algorithm for large-scale dataPattern Recogn.2023136109,23810.1016/j.patcog.2022.109238 – reference: Hahsler, M., Piekenbrock, M.: dbscan: Density-based spatial clustering of applications with noise (DBSCAN) and related algorithms (2022). https://CRAN.R-project.org/package=dbscan. R package version 1.1-10 – reference: NgRTHanJClarans: a method for clustering objects for spatial data miningIEEE Trans. Knowl. Data Eng.20021451003101610.1109/TKDE.2002.1033770 – reference: Ikotun, A.M., Ezugwu, A.E., Abualigah, L., Abuhaija, B., Heming, J.: K-means clustering algorithms: a comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. (2022) – reference: LuHPlataniotisKNVenetsanopoulosANMPCA: Multilinear principal component analysis of tensor objectsIEEE Trans. Neural Netw.2008191183910.1109/TNN.2007.901277 – reference: Dong, W., Moses, C., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th International Conference on World Wide Web, pp. 577–586 (2011) – reference: LeCunYBottouLBengioYHaffnerPGradient-based learning applied to document recognitionProc. IEEE199886112278232410.1109/5.726791 – reference: ChenTLHsiehDNHungHTuIPWuPSWuYMChangWHHuangSYγ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document}-SUP: a clustering algorithm for cryo-electron microscopy images of asymmetric particlesAnn. Appl. Stat.201481259285319199010.1214/13-AOAS680 – reference: GuhaSRastogiRShimKCure: an efficient clustering algorithm for large databasesACM SIGMOD Rec.1998272738410.1145/276305.276312 – reference: Chen, T.L., Shiu, S.Y.: A clustering algorithm by self-updating process. In: JSM Proceedings, Statistical Computing Section, Salt Lake City, Utah, pp. 2034–2038 (2007) – reference: Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J Mach. Learn. Res. 9(11) (2008) – ident: 10355_CR31 – volume: 363 start-page: 8 year: 2016 ident: 10355_CR12 publication-title: Inf. Sci. doi: 10.1016/j.ins.2016.05.003 – volume: 27 start-page: 73 issue: 2 year: 1998 ident: 10355_CR13 publication-title: ACM SIGMOD Rec. doi: 10.1145/276305.276312 – ident: 10355_CR29 – volume: 101 start-page: 101,804 year: 2021 ident: 10355_CR26 publication-title: Inf. Syst. doi: 10.1016/j.is.2021.101804 – volume: 18 start-page: 151 issue: 2 year: 2019 ident: 10355_CR30 publication-title: Fuzzy Optim. Decis. Mak. doi: 10.1007/s10700-018-9290-7 – ident: 10355_CR1 doi: 10.1145/1576246.1531327 – ident: 10355_CR14 – ident: 10355_CR6 – volume-title: Finding Groups in Data: An Introduction to Cluster Analysis year: 2009 ident: 10355_CR18 – ident: 10355_CR2 – volume: 39 start-page: 2259 issue: 10 year: 2012 ident: 10355_CR16 publication-title: J. Appl. Stat. doi: 10.1080/02664763.2012.706268 – ident: 10355_CR9 doi: 10.1145/1963405.1963487 – volume: 85 start-page: 3047 issue: 15 year: 2015 ident: 10355_CR5 publication-title: J. Stat. Comput. Simul. doi: 10.1080/00949655.2014.949715 – volume: 34 start-page: 671 issue: 6 year: 2019 ident: 10355_CR3 publication-title: Int. J. Parallel Emergent Distrib. Syst. doi: 10.1080/17445760.2018.1446210 – volume: 14 start-page: 1003 issue: 5 year: 2002 ident: 10355_CR24 publication-title: IEEE Trans. Knowl. Data Eng. doi: 10.1109/TKDE.2002.1033770 – volume: 86 start-page: 1010 issue: 5 year: 2016 ident: 10355_CR27 publication-title: J. Stat. Comput. Simul. doi: 10.1080/00949655.2015.1049605 – volume: 19 start-page: 18 issue: 1 year: 2008 ident: 10355_CR21 publication-title: IEEE Trans. Neural Netw. doi: 10.1109/TNN.2007.901277 – ident: 10355_CR17 doi: 10.1016/j.ins.2022.11.139 – volume: 86 start-page: 2278 issue: 11 year: 1998 ident: 10355_CR19 publication-title: Proc. IEEE doi: 10.1109/5.726791 – volume: 110 start-page: 104,743 year: 2022 ident: 10355_CR10 publication-title: Eng. Appl. Artif. Intell. doi: 10.1016/j.engappai.2022.104743 – ident: 10355_CR23 doi: 10.1145/1835804.1835882 – volume: 91 start-page: 1 year: 2019 ident: 10355_CR15 publication-title: J. Stat. Softw. doi: 10.18637/jss.v091.i01 – ident: 10355_CR20 – volume: 67 start-page: 157 issue: 1 year: 2015 ident: 10355_CR4 publication-title: Ann. Inst. Stat. Math. doi: 10.1007/s10463-013-0443-8 – ident: 10355_CR28 doi: 10.1109/PDCAT.2009.46 – ident: 10355_CR22 – volume: 136 start-page: 109,238 year: 2023 ident: 10355_CR8 publication-title: Pattern Recogn. doi: 10.1016/j.patcog.2022.109238 – volume: 344 start-page: 1492 issue: 6191 year: 2014 ident: 10355_CR25 publication-title: Science doi: 10.1126/science.1242072 – volume: 15 start-page: 100,722 year: 2021 ident: 10355_CR11 publication-title: SoftwareX doi: 10.1016/j.softx.2021.100722 – volume: 8 start-page: 259 issue: 1 year: 2014 ident: 10355_CR7 publication-title: Ann. Appl. Stat. doi: 10.1214/13-AOAS680
SSID	ssj0011634
Score	2.3576229
Snippet	This paper introduces the randomized self-updating process (rSUP) algorithm for clustering large-scale data. rSUP is an extension of the self-updating process...
SourceID	proquest crossref springer
SourceType	Aggregation Database Index Database Publisher
SubjectTerms	Algorithms Artificial Intelligence Clustering Computational efficiency Computer Science Data points Effectiveness Iterative methods Original Paper Probability and Statistics in Computer Science Statistical Theory and Methods Statistics and Computing/Statistics Programs
Title	Randomized self-updating process for clustering large-scale data
URI	https://link.springer.com/article/10.1007/s11222-023-10355-8 https://www.proquest.com/docview/2893433365
Volume	34
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVLSH databaseName: SpringerLink Journals customDbUrl: mediaType: online eissn: 1573-1375 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0011634 issn: 0960-3174 databaseCode: AFBBN dateStart: 19910901 isFulltext: true providerName: Library Specific Holdings – providerCode: PRVAVX databaseName: SpringerLINK - Czech Republic Consortium customDbUrl: eissn: 1573-1375 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0011634 issn: 0960-3174 databaseCode: AGYKE dateStart: 19970101 isFulltext: true titleUrlDefault: http://link.springer.com providerName: Springer Nature – providerCode: PRVAVX databaseName: SpringerLink Journals (ICM) customDbUrl: eissn: 1573-1375 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0011634 issn: 0960-3174 databaseCode: U2A dateStart: 19970101 isFulltext: true titleUrlDefault: http://www.springerlink.com/journals/ providerName: Springer Nature
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFH7odpkHf0zF6Rw9eNNAmzRJe3PIdCjzIA7mqSRNIsLsht0u_vUm_bGi6MFrWxJ4yXvfV_K-LwAXxDeSiVAjzZiPQoNDJH3jxO4W_GWkY0Oc3nnyyMbT8H5GZ5UoLK-73esjyaJSN2K3wGIZshhjS4dFSRRtQ5s6Oy-7i6d4uDk7sJMUplGWm9sKw8NKKvP7GN_hqOGYP45FC7S53YfdiiZ6w3JdD2BLZ13Yq69g8KqM7MLOZGO7mneh46hj6bx8CNdPIlOL97dPrbxczw1aL52UIXv1lqU4wLN81Uvna2eV4B7PXVM4yu2iac81jh7B9Hb0fDNG1X0JKMXcXyHJVJRiKRWRmhubi4FQNE11iLUvTBzElGOmqZYhZ0oJJePIF5GIMWeCKuOTY2hli0yfgEdwqoSzwnJmMAHmQgaSUC1oQCjXxPTgsg5bsixtMZLGANkFObFBToogJ1EP-nVkkypF8sT-6ZGQEMJoD67qaDev_x7t9H-fn0EHWyJSdlr3obX6WOtzSyRWcgDt4d3Lw2hQ7J8vojO__w
linkProvider	Springer Nature
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED1BGYCBQgFRKJCBDYwSfyXZQAgoH-2AWqlMkR3bCFHSirQLvx67SaioYGBNIie58909y_eeAU6IbyQXVCPNuY-owRRJ3ziyuy3-MtKxIY7v3Onydp_eD9igJIXlVbd7tSU5y9RzsltgaxmyNcamDlslUbQMK9QuUHANVi5vnx-uv3cP7GtmslEWndscE9KSLPP7KD8L0hxlLmyMzurNTR361ZcWbSZv59OJPE8_F0Qc__srm7BRAlDvspgxW7CkswbUq8MdvDLWG7De-RZ0zRuw5kBpoem8DRdPIlOj99dPrbxcDw2ajh1JInvxxgXtwLNI2EuHUyfC4C4PXbs5yu100J5rSd2B_s1176qNypMYUIpDf4IkV1GKpVRE6tDYKA-EYmmqKda-MHEQsxBzzbSkIVdKKBlHvohEjEMumDI-2YVaNsr0HngEp0o4kS0nMxPgUMhAEqYFCwgLNTFNOK3ckYwLwY1kLq3s7JZYuyUzuyVRE1qVx5Iy-PLEriEJJYRw1oSzygHz23-Ptv-_x49htd3rPCaPd92HA1jDFu4U_dwtqE0-pvrQwpWJPCpn5xf5cN5C
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwEB2xSKgcWAqIQoEcuIHVxFuSGwio2FohRKXeIju2EVJJK9Je-HrsLBQQHLgmkS09ezwvmnnPAMfEN5ILqpHm3EfUYIqkb5zY3SZ_GenYEKd37vX59YDeDtnwi4q_6HavS5KlpsG5NGXTzkSZzlz4Fti8hmy-sceIzZgoWoRl6owS7I4e4PPPOoKdsDCQsjzdnjYhrWQzv4_xPTXN-eaPEmmRebobsFZRRu-8XONNWNBZE9br6xi8KjqbsNr7tGDNm9BwNLJ0Yd6Cs0eRqfHry7tWXq5HBs0mTtaQPXuTUijgWe7qpaOZs01wj0euQRzldgG155pIt2HQvXq6uEbV3QkoxaE_RZKrKMVSKiJ1aGxcBkKxNNUUa1-YOIhZiLlmWtKQKyWUjCNfRCLGIRdMGZ_swFI2zvQueASnSjhbLGcME-BQyEASpgULCAs1MS04qWFLJqVFRjI3Q3YgJxbkpAA5iVrQrpFNqnDJE_vXRyghhLMWnNZoz1__Pdre_z4_gpWHy25yf9O_24cGtvykbMBuw9L0baYPLL-YysNiC30APsnFkw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Randomized+self-updating+process+for+clustering+large-scale+data&rft.jtitle=Statistics+and+computing&rft.au=Shang-Ying%2C+Shiu&rft.au=Yen-Shiu%2C+Chin&rft.au=Szu-Han%2C+Lin&rft.au=Ting-Li%2C+Chen&rft.date=2024-02-01&rft.pub=Springer+Nature+B.V&rft.issn=0960-3174&rft.eissn=1573-1375&rft.volume=34&rft.issue=1&rft_id=info:doi/10.1007%2Fs11222-023-10355-8&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0960-3174&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0960-3174&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0960-3174&client=summon