One-Pass Additive-Error Subset Selection for ℓp Subspace Approximation and (k, p)-Clustering
We consider the problem of subset selection for ℓ p subspace approximation and ( k , p )-clustering. Our aim is to efficiently find a small subset of data points such that solving the problem optimally for this subset gives a good approximation to solving the problem optimally for the original inpu...
Saved in:
| Published in | Algorithmica Vol. 85; no. 10; pp. 3144 - 3167 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
New York
Springer US
01.10.2023
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0178-4617 1432-0541 |
| DOI | 10.1007/s00453-023-01124-0 |
Cover
| Abstract | We consider the problem of subset selection for
ℓ
p
subspace approximation and (
k
,
p
)-clustering. Our aim is to efficiently find a
small
subset of data points such that solving the problem optimally for this subset gives a good approximation to solving the problem optimally for the original input. For
ℓ
p
subspace approximation, previously known subset selection algorithms based on volume sampling and adaptive sampling proposed in Deshpande and Varadarajan (STOC’07, 2007), for the general case of
p
∈
[
1
,
∞
)
, require multiple passes over the data. In this paper, we give a one-pass subset selection with an additive approximation guarantee for
ℓ
p
subspace approximation, for any
p
∈
[
1
,
∞
)
. Earlier subset selection algorithms that give a one-pass multiplicative
(
1
+
ϵ
)
approximation work under the special cases. Cohen et al. (SODA’17, 2017) gives a one-pass subset section that offers multiplicative
(
1
+
ϵ
)
approximation guarantee for the special case of
ℓ
2
subspace approximation. Mahabadi et al. (STOC’20, 2020) gives a one-pass
noisy
subset selection with
(
1
+
ϵ
)
approximation guarantee for
ℓ
p
subspace approximation when
p
∈
{
1
,
2
}
. Our subset selection algorithm gives a weaker, additive approximation guarantee, but it works for any
p
∈
[
1
,
∞
)
. We also focus on (
k
,
p
)-clustering, where the task is to group the data points into
k
clusters such that the sum of distances from points to cluster centers (raised to the power
p
) is minimized for
p
∈
[
1
,
∞
)
. The subset selection algorithms are based on
D
p
sampling due to Wei (NIPS’16, 2016) which is an extension of
D
2
sampling proposed in Arthur and Vassilvitskii (SODA’07, 2007). Due to the inherently adaptive nature of the
D
p
sampling, these algorithms require taking multiple passes over the input. In this work, we suggest one pass subset selection for (
k
,
p
)-clustering that gives constant factor approximation with respect to the optimal solution with an additive approximation guarantee. Bachem et al. (NIPS’16, 2016) also gives one pass subset selection for
k
-means for
p
=
2
; however, our result gives a solution for a more generic problem when
p
∈
[
1
,
∞
)
. At the core, our contribution lies in showing a one-pass MCMC-based subset selection algorithm such that its cost incurred due to the sampled points closely approximates the corresponding optimal cost, with high probability. |
|---|---|
| AbstractList | We consider the problem of subset selection for ℓp subspace approximation and (k, p)-clustering. Our aim is to efficiently find a small subset of data points such that solving the problem optimally for this subset gives a good approximation to solving the problem optimally for the original input. For ℓp subspace approximation, previously known subset selection algorithms based on volume sampling and adaptive sampling proposed in Deshpande and Varadarajan (STOC’07, 2007), for the general case of p∈[1,∞), require multiple passes over the data. In this paper, we give a one-pass subset selection with an additive approximation guarantee for ℓp subspace approximation, for any p∈[1,∞). Earlier subset selection algorithms that give a one-pass multiplicative (1+ϵ) approximation work under the special cases. Cohen et al. (SODA’17, 2017) gives a one-pass subset section that offers multiplicative (1+ϵ) approximation guarantee for the special case of ℓ2 subspace approximation. Mahabadi et al. (STOC’20, 2020) gives a one-pass noisy subset selection with (1+ϵ) approximation guarantee for ℓp subspace approximation when p∈{1,2}. Our subset selection algorithm gives a weaker, additive approximation guarantee, but it works for any p∈[1,∞). We also focus on (k, p)-clustering, where the task is to group the data points into k clusters such that the sum of distances from points to cluster centers (raised to the power p) is minimized for p∈[1,∞). The subset selection algorithms are based on Dp sampling due to Wei (NIPS’16, 2016) which is an extension of D2 sampling proposed in Arthur and Vassilvitskii (SODA’07, 2007). Due to the inherently adaptive nature of the Dp sampling, these algorithms require taking multiple passes over the input. In this work, we suggest one pass subset selection for (k, p)-clustering that gives constant factor approximation with respect to the optimal solution with an additive approximation guarantee. Bachem et al. (NIPS’16, 2016) also gives one pass subset selection for k-means for p=2; however, our result gives a solution for a more generic problem when p∈[1,∞). At the core, our contribution lies in showing a one-pass MCMC-based subset selection algorithm such that its cost incurred due to the sampled points closely approximates the corresponding optimal cost, with high probability. We consider the problem of subset selection for ℓ p subspace approximation and ( k , p )-clustering. Our aim is to efficiently find a small subset of data points such that solving the problem optimally for this subset gives a good approximation to solving the problem optimally for the original input. For ℓ p subspace approximation, previously known subset selection algorithms based on volume sampling and adaptive sampling proposed in Deshpande and Varadarajan (STOC’07, 2007), for the general case of p ∈ [ 1 , ∞ ) , require multiple passes over the data. In this paper, we give a one-pass subset selection with an additive approximation guarantee for ℓ p subspace approximation, for any p ∈ [ 1 , ∞ ) . Earlier subset selection algorithms that give a one-pass multiplicative ( 1 + ϵ ) approximation work under the special cases. Cohen et al. (SODA’17, 2017) gives a one-pass subset section that offers multiplicative ( 1 + ϵ ) approximation guarantee for the special case of ℓ 2 subspace approximation. Mahabadi et al. (STOC’20, 2020) gives a one-pass noisy subset selection with ( 1 + ϵ ) approximation guarantee for ℓ p subspace approximation when p ∈ { 1 , 2 } . Our subset selection algorithm gives a weaker, additive approximation guarantee, but it works for any p ∈ [ 1 , ∞ ) . We also focus on ( k , p )-clustering, where the task is to group the data points into k clusters such that the sum of distances from points to cluster centers (raised to the power p ) is minimized for p ∈ [ 1 , ∞ ) . The subset selection algorithms are based on D p sampling due to Wei (NIPS’16, 2016) which is an extension of D 2 sampling proposed in Arthur and Vassilvitskii (SODA’07, 2007). Due to the inherently adaptive nature of the D p sampling, these algorithms require taking multiple passes over the input. In this work, we suggest one pass subset selection for ( k , p )-clustering that gives constant factor approximation with respect to the optimal solution with an additive approximation guarantee. Bachem et al. (NIPS’16, 2016) also gives one pass subset selection for k -means for p = 2 ; however, our result gives a solution for a more generic problem when p ∈ [ 1 , ∞ ) . At the core, our contribution lies in showing a one-pass MCMC-based subset selection algorithm such that its cost incurred due to the sampled points closely approximates the corresponding optimal cost, with high probability. |
| Author | Deshpande, Amit Pratap, Rameshwar |
| Author_xml | – sequence: 1 givenname: Amit surname: Deshpande fullname: Deshpande, Amit organization: Microsoft Research, Bangalore – sequence: 2 givenname: Rameshwar surname: Pratap fullname: Pratap, Rameshwar email: rameshwar@cse.iith.ac.in organization: Department of Computer Science and Engineering, IIT Hyderabad |
| BookMark | eNpFkF9Kw0AQxhepYFu9gE8BXxRcnf2XbR5LqX-goFB9XjbZiaTWTcwm4gG8gRfwLB7Fk7htBWGGgfk-Zj5-IzLwtUdCjhlcMAB9GQCkEhR4bMa4pLBHhkwKTkFJNiBDYHpCZcr0ARmFsAJgXGfpkJg7j_TehpBMnau66g3pvG3rNln2ecAuWeIai66qfVLG5c_HZ7NVGltgMm2atn6vXuxWt94lp8_n31_NGZ2t-9BhW_mnQ7Jf2nXAo785Jo9X84fZDV3cXd_OpgvacK47qlDkSrmyAAQueZ5haWNxy9C6iXMlWuU4qDxWKjIRHRYyWUid5Zq7UozJye5ujPTaY-jMqu5bH18aPkkznmaSsegSO1doNuGw_XcxMBuSZkfSRJJmS9KA-AWZZ2p8 |
| ContentType | Journal Article |
| Copyright | The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. |
| Copyright_xml | – notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. |
| DBID | JQ2 |
| DOI | 10.1007/s00453-023-01124-0 |
| DatabaseName | ProQuest Computer Science Collection |
| DatabaseTitle | ProQuest Computer Science Collection |
| DatabaseTitleList | ProQuest Computer Science Collection |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1432-0541 |
| EndPage | 3167 |
| ExternalDocumentID | 10_1007_s00453_023_01124_0 |
| GroupedDBID | -4Z -59 -5G -BR -EM -Y2 -~C -~X .86 .DC .VR 06D 0R~ 0VY 199 1N0 1SB 203 23M 28- 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 4.4 406 408 409 40D 40E 5GY 5QI 5VS 67Z 6NX 78A 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDPE ABDZT ABECU ABFSI ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABLJU ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTAH ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHSBF AHYZX AI. AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. BA0 BBWZM BDATZ BGNMA BSONS CAG COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP E.L EBLON EBS EIOEI EJD ESBYG FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GXS H13 HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ H~9 I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV KOW LAS LLZTM M4Y MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM P19 P9O PF- PT4 PT5 QOK QOS R4E R89 R9I RHV RIG RNI RNS ROL RPX RSV RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SDM SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TN5 TSG TSK TSV TUC U2A UG4 UOJIU UQL UTJUX UZXMN VC2 VFIZW VH1 VXZ W23 W48 WK8 YLTOR Z45 Z7X Z83 Z88 Z8R Z8W Z92 ZMTXR ZY4 ~EX AAPKM ABBRH ABDBE ABFSG ABRTQ ACSTC AEZWR AFDZB AFHIU AFOHR AHPBZ AHWEU AIXLP ATHPR AYFIA JQ2 |
| ID | FETCH-LOGICAL-p227t-5e3b55dfc0e0242b9efaefa2a1ead8ddfea5d205b05b63932b9a094c479b72df3 |
| IEDL.DBID | AGYKE |
| ISSN | 0178-4617 |
| IngestDate | Thu Oct 02 16:37:19 EDT 2025 Fri Feb 21 02:41:44 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 10 |
| Keywords | Subspace approximation Subset selection means clustering |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-p227t-5e3b55dfc0e0242b9efaefa2a1ead8ddfea5d205b05b63932b9a094c479b72df3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| PQID | 2869269411 |
| PQPubID | 2043795 |
| PageCount | 24 |
| ParticipantIDs | proquest_journals_2869269411 springer_journals_10_1007_s00453_023_01124_0 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-10-01 |
| PublicationDateYYYYMMDD | 2023-10-01 |
| PublicationDate_xml | – month: 10 year: 2023 text: 2023-10-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | Algorithmica |
| PublicationTitleAbbrev | Algorithmica |
| PublicationYear | 2023 |
| Publisher | Springer US Springer Nature B.V |
| Publisher_xml | – name: Springer US – name: Springer Nature B.V |
| References | Ghashami, M., Liberty, E., Phillips, J.M., Woodruff, D.P.: Frequent directions : Simple and deterministic matrix sketching. CoRR abs/1501.01711 (2015) arXiv:1501.01711 Deshpande, A., Varadarajan, K.: Sampling-based dimension reduction for subspace approximation. In: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing. STOC ’07, pp. 641–650. Association for Computing Machinery, New York, (2007). https://doi.org/10.1145/1250790.1250884 Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive sampling for k-means clustering, pp. 15–28 (2009). https://doi.org/10.1007/978-3-642-03685-9_2 LloydSLeast squares quantization in pcmIEEE Trans. Inf. Theor.200628212913765180710.1109/TIT.1982.10564890504.94015 Ida, Y., Kanai, S., Fujiwara, Y., Iwata, T., Takeuchi, K., Kashima, H.: Fast deterministic cur matrix decomposition with accuracy assurance. In: International Conference on Machine Learning, pp. 4594–4603 (2020). PMLR CHAO, M.T.: A general purpose unequal probability sampling plan. Biometrika 69(3), 653–656 (1982) https://academic.oup.com/biomet/article-pdf/69/3/653/591311/69-3-653.pdf. https://doi.org/10.1093/biomet/69.3.653 Boutsidis, C., Mahoney, M.W., Drineas, P.: An improved approximation algorithm for the column subset selection problem. In: Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 968–977 (2009). SIAM Ailon, N., Jaiswal, R., Monteleoni, C.: Streaming k-means approximation. Adv Neural Inform Process Syst22 (2009) Liberty, E.: Simple and deterministic matrix sketching. In:Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’13, pp. 581–588. Association for Computing Machinery, New York, NY, USA (2013). https://doi.org/10.1145/2487575.2487623 Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Fast and provably good seedings for k-means. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16, pp. 55–63. Curran Associates Inc., Red Hook (2016) Deshpande, A., Vempala, S.: Adaptive sampling and fast low-rank matrix approximation. In: Proceedings of the 9th International Conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th International Conference on Randomization and Computation. APPROX’06/RANDOM’06, pp. 292–303. Springer, Berlin, Heidelberg (2006). https://doi.org/10.1007/11830924_28 Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Approximate k-means++ in sublinear time. Proceedings of the AAAI Conference on Artificial Intelligence 30(1) (2016). https://doi.org/10.1609/aaai.v30i1.10259 Guruswami, V., Sinop, A.K.: Optimal column-based low-rank matrix reconstruction. In: Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’12, pp. 1207–1214. Society for Industrial and Applied Mathematics, USA (2012) Cohen, M.B., Musco, C., Musco, C.: Input sparsity time low-rank approximation via ridge leverage score sampling. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’17, pp. 1758–1777. Society for Industrial and Applied Mathematics, USA (2017) Wei, K., Iyer, R., Bilmes, J.: Submodularity in data subset selection and active learning. In: Proceedings of the 32nd International Conference on Machine Learning, vol. 37, pp. 1954–1963 (2015) Dan, C., Wang, H., Zhang, H., Zhou, Y., Ravikumar, P.K.: Optimal analysis of subset-selection based ℓp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _p$$\end{document} low-rank approximation. Advances in Neural Information Processing Systems 32 (2019) Cormode, G., Dickens, C., Woodruff, D.: Leveraging well-conditioned bases: Streaming and distributed summaries in minkowski p-norms. In: International Conference on Machine Learning, pp. 1048–1056 (2018) Wei, D.: A constant-factor bi-criteria approximation guarantee for k-means++. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16, pp. 604–612. Curran Associates Inc., Red Hook (2016) Efraimidis, P.S., Spirakis, P.P.: Weighted random sampling. In: Encyclopedia of Algorithms, pp. 2365–2367 (2016). https://doi.org/10.1007/978-1-4939-2864-4_478 Feldman, D.: Introduction to Core-sets: an Updated Survey (2020) FriezeAMKannanRVempalaSSFast monte-carlo algorithms for finding low-rank approximationsJ. ACM200451610251041214526210.1145/1039488.10394941125.65005 Arthur, D., Vassilvitskii, S.: K-means++: The advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’07, pp. 1027–1035. Society for Industrial and Applied Mathematics, USA (2007) Chierichetti, F., Gollapudi, S., Kumar, R., Lattanzi, S., Panigrahy, R., Woodruff, D.P.: Algorithms for ℓp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _p$$\end{document} low-rank approximation. In: International Conference on Machine Learning, pp. 806–814 (2017). PMLR Mahoney, M.W., Maggioni, M., Drineas, P.: Tensor-cur decompositions for tensor-based data. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 327–336 (2006) AloiseDDeshpandeAHansenPPopatPNp-hardness of euclidean sum-of-squares clusteringMach. Learn.200975224524810.1007/s10994-009-5103-01378.68047 WangSZhangZImproving CUR matrix decomposition and the nyström approximation via adaptive samplingJ. Mach. Learn. Res.20131412729276931216561318.65023 Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1+ϵ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1+\epsilon )$$\end{document}-approximation algorithm for k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}-means clustering in any dimensions. In: 45th Annual IEEE Symposium on Foundations of Computer Science, pp. 454–462 (2004). IEEE CaiHExact bound for the convergence of metropolis chainsStochastic Anal Appl20001816371173929610.1080/073629900088096540978.60080 MahajanMNimbhorkarPVaradarajanKRThe planar k-means problem is np-hardTheor. Comput. Sci.20124421321292709710.1016/j.tcs.2010.05.0341260.68158 DeshpandeARademacherLVempalaSSWangGMatrix approximation and projective clustering via volume samplingTheory Comput.2006212225247232287910.4086/toc.2006.v002a0121213.68702 Clarkson, K.L., Woodruff, D.P.: Low-rank approximation and regression in input sparsity time. J. ACM 63(6) (2017). https://doi.org/10.1145/3019134 DrineasPMahoneyMWMuthukrishnanSRelative-error CUR matrix decompositionsSIAM J. Matrix Anal. Appl.2008302844881244397510.1137/07070471X1183.68738 Broadbent, M.E., Brown, M., Penner, K., Ipsen, I., Rehman, R.: Subset selection algorithms: Randomized vs. deterministic. SIAM Undergraduate Research Online 3(01) (2010) Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012). https://doi.org/10.14778/2180912.2180915 Feldman, D., Monemizadeh, M., Sohler, C., Woodruff, D.P.: Coresets and sketches for high dimensional subspace approximation problems. In: Proceedings of the Twenty-first Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 630–649 (2010). SIAM Ghashami, M., Phillips, J.M.: Relative errors for deterministic low-rank matrix approximations. In:Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’14, pp. 707–717. Society for Industrial and Applied Mathematics, USA (2014) MahoneyMWRandomized algorithms for matrices and dataFound. Trends Mach. Learn.20113212322410.1561/22000000351232.68173 VitterJSRandom sampling with a reservoirACM Trans. Math. Softw.1985111375779305610.1145/3147.31650562.68028 Drineas, P., Kerenidis, I., Raghavan, P.: Competitive recommendation systems. In: Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing, pp. 82–90 (2002) GhashamiMLibertyEPhillipsJMWoodruffDPFrequent directions: Simple and deterministic matrix sketchingSIAM J. Comput.201645517621792354698210.1137/15M10097181348.65075 MahoneyMWDrineasPCUR matrix decompositions for improved data analysisProc. Natl. Acad. Sci. USA20091063697702247579510.1073/pnas.08032051061202.68480 Sun, J., Xie, Y., Zhang, H., Faloutsos, C.: Less is more: Compact matrix decomposition for large sparse graphs. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 366–377 (2007). SIAM JaiswalRKumarASenSA simple D 2-sampling based PTAS for k-means and other clustering problemsAlgorithmica20147012246322966610.1007/s00453-013-9833-91364.68369 JaiswalRKumarMYadavPImproved analysis of d2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$^{\text{2}}$$\end{document}-sampling based PTAS for k-means and other clustering problemsInf. Process. Lett.20151152100103327928810.1016/j.ipl.2014.07.0091302.68341 Anari, N., Gharan, S.O., Rezaei, A.: Monte carlo markov chain algorithms for sampling strongly rayleigh distributions and determinantal point processes. In: Conference on Learning Theory, pp. 103–115 (2016). PMLR Derezinski, M., Warmuth, M.K.: Unbiased estimates for linear regression via volume sampling. Advances in Neural Information Processing Systems 30 (2017) Mahabadi, S., Razenshteyn, I., Woodru |
| References_xml | – reference: VitterJSRandom sampling with a reservoirACM Trans. Math. Softw.1985111375779305610.1145/3147.31650562.68028 – reference: Deshpande, A., Vempala, S.: Adaptive sampling and fast low-rank matrix approximation. In: Proceedings of the 9th International Conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th International Conference on Randomization and Computation. APPROX’06/RANDOM’06, pp. 292–303. Springer, Berlin, Heidelberg (2006). https://doi.org/10.1007/11830924_28 – reference: Wei, K., Iyer, R., Bilmes, J.: Submodularity in data subset selection and active learning. In: Proceedings of the 32nd International Conference on Machine Learning, vol. 37, pp. 1954–1963 (2015) – reference: Chierichetti, F., Gollapudi, S., Kumar, R., Lattanzi, S., Panigrahy, R., Woodruff, D.P.: Algorithms for ℓp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _p$$\end{document} low-rank approximation. In: International Conference on Machine Learning, pp. 806–814 (2017). PMLR – reference: Feldman, D.: Introduction to Core-sets: an Updated Survey (2020) – reference: FriezeAMKannanRVempalaSSFast monte-carlo algorithms for finding low-rank approximationsJ. ACM200451610251041214526210.1145/1039488.10394941125.65005 – reference: Mahabadi, S., Razenshteyn, I., Woodruff, D.P., Zhou, S.: Non-adaptive adaptive sampling on turnstile streams. In: Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing. STOC 2020, pp. 1251–1264. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3357713.3384331 – reference: Feldman, D., Monemizadeh, M., Sohler, C., Woodruff, D.P.: Coresets and sketches for high dimensional subspace approximation problems. In: Proceedings of the Twenty-first Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 630–649 (2010). SIAM – reference: Ailon, N., Jaiswal, R., Monteleoni, C.: Streaming k-means approximation. Adv Neural Inform Process Syst22 (2009) – reference: Derezinski, M., Warmuth, M.K.: Unbiased estimates for linear regression via volume sampling. Advances in Neural Information Processing Systems 30 (2017) – reference: Arthur, D., Vassilvitskii, S.: K-means++: The advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’07, pp. 1027–1035. Society for Industrial and Applied Mathematics, USA (2007) – reference: CaiHExact bound for the convergence of metropolis chainsStochastic Anal Appl20001816371173929610.1080/073629900088096540978.60080 – reference: WangSZhangZImproving CUR matrix decomposition and the nyström approximation via adaptive samplingJ. Mach. Learn. Res.20131412729276931216561318.65023 – reference: Efraimidis, P.S., Spirakis, P.P.: Weighted random sampling. In: Encyclopedia of Algorithms, pp. 2365–2367 (2016). https://doi.org/10.1007/978-1-4939-2864-4_478 – reference: LloydSLeast squares quantization in pcmIEEE Trans. Inf. Theor.200628212913765180710.1109/TIT.1982.10564890504.94015 – reference: Cormode, G., Dickens, C., Woodruff, D.: Leveraging well-conditioned bases: Streaming and distributed summaries in minkowski p-norms. In: International Conference on Machine Learning, pp. 1048–1056 (2018) – reference: Mahoney, M.W., Maggioni, M., Drineas, P.: Tensor-cur decompositions for tensor-based data. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 327–336 (2006) – reference: Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012). https://doi.org/10.14778/2180912.2180915 – reference: Wei, D.: A constant-factor bi-criteria approximation guarantee for k-means++. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16, pp. 604–612. Curran Associates Inc., Red Hook (2016) – reference: Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Fast and provably good seedings for k-means. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16, pp. 55–63. Curran Associates Inc., Red Hook (2016) – reference: MahoneyMWDrineasPCUR matrix decompositions for improved data analysisProc. Natl. Acad. Sci. USA20091063697702247579510.1073/pnas.08032051061202.68480 – reference: JaiswalRKumarASenSA simple D 2-sampling based PTAS for k-means and other clustering problemsAlgorithmica20147012246322966610.1007/s00453-013-9833-91364.68369 – reference: Broadbent, M.E., Brown, M., Penner, K., Ipsen, I., Rehman, R.: Subset selection algorithms: Randomized vs. deterministic. SIAM Undergraduate Research Online 3(01) (2010) – reference: Sun, J., Xie, Y., Zhang, H., Faloutsos, C.: Less is more: Compact matrix decomposition for large sparse graphs. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 366–377 (2007). SIAM – reference: Ghashami, M., Liberty, E., Phillips, J.M., Woodruff, D.P.: Frequent directions : Simple and deterministic matrix sketching. CoRR abs/1501.01711 (2015) arXiv:1501.01711 – reference: Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Approximate k-means++ in sublinear time. Proceedings of the AAAI Conference on Artificial Intelligence 30(1) (2016). https://doi.org/10.1609/aaai.v30i1.10259 – reference: MahoneyMWRandomized algorithms for matrices and dataFound. Trends Mach. Learn.20113212322410.1561/22000000351232.68173 – reference: Liberty, E.: Simple and deterministic matrix sketching. In:Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’13, pp. 581–588. Association for Computing Machinery, New York, NY, USA (2013). https://doi.org/10.1145/2487575.2487623 – reference: Ghashami, M., Phillips, J.M.: Relative errors for deterministic low-rank matrix approximations. In:Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’14, pp. 707–717. Society for Industrial and Applied Mathematics, USA (2014) – reference: Braverman, V., Drineas, P., Musco, C., Musco, C., Upadhyay, J., Woodruff, D.P., Zhou, S.: Near optimal linear algebra in the online and sliding window models. In: 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pp. 517–528 (2020). IEEE – reference: Boutsidis, C., Mahoney, M.W., Drineas, P.: An improved approximation algorithm for the column subset selection problem. In: Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 968–977 (2009). SIAM – reference: AloiseDDeshpandeAHansenPPopatPNp-hardness of euclidean sum-of-squares clusteringMach. Learn.200975224524810.1007/s10994-009-5103-01378.68047 – reference: Anari, N., Gharan, S.O., Rezaei, A.: Monte carlo markov chain algorithms for sampling strongly rayleigh distributions and determinantal point processes. In: Conference on Learning Theory, pp. 103–115 (2016). PMLR – reference: Guruswami, V., Sinop, A.K.: Optimal column-based low-rank matrix reconstruction. In: Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’12, pp. 1207–1214. Society for Industrial and Applied Mathematics, USA (2012) – reference: Cohen, M.B., Musco, C., Musco, C.: Input sparsity time low-rank approximation via ridge leverage score sampling. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’17, pp. 1758–1777. Society for Industrial and Applied Mathematics, USA (2017) – reference: MahajanMNimbhorkarPVaradarajanKRThe planar k-means problem is np-hardTheor. Comput. Sci.20124421321292709710.1016/j.tcs.2010.05.0341260.68158 – reference: DrineasPMahoneyMWMuthukrishnanSRelative-error CUR matrix decompositionsSIAM J. Matrix Anal. Appl.2008302844881244397510.1137/07070471X1183.68738 – reference: Deshpande, A., Varadarajan, K.: Sampling-based dimension reduction for subspace approximation. In: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing. STOC ’07, pp. 641–650. Association for Computing Machinery, New York, (2007). https://doi.org/10.1145/1250790.1250884 – reference: Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive sampling for k-means clustering, pp. 15–28 (2009). https://doi.org/10.1007/978-3-642-03685-9_2 – reference: Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1+ϵ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1+\epsilon )$$\end{document}-approximation algorithm for k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}-means clustering in any dimensions. In: 45th Annual IEEE Symposium on Foundations of Computer Science, pp. 454–462 (2004). IEEE – reference: Drineas, P., Kerenidis, I., Raghavan, P.: Competitive recommendation systems. In: Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing, pp. 82–90 (2002) – reference: GhashamiMLibertyEPhillipsJMWoodruffDPFrequent directions: Simple and deterministic matrix sketchingSIAM J. Comput.201645517621792354698210.1137/15M10097181348.65075 – reference: CHAO, M.T.: A general purpose unequal probability sampling plan. Biometrika 69(3), 653–656 (1982) https://academic.oup.com/biomet/article-pdf/69/3/653/591311/69-3-653.pdf. https://doi.org/10.1093/biomet/69.3.653 – reference: JaiswalRKumarMYadavPImproved analysis of d2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$^{\text{2}}$$\end{document}-sampling based PTAS for k-means and other clustering problemsInf. Process. Lett.20151152100103327928810.1016/j.ipl.2014.07.0091302.68341 – reference: Clarkson, K.L., Woodruff, D.P.: Low-rank approximation and regression in input sparsity time. J. ACM 63(6) (2017). https://doi.org/10.1145/3019134 – reference: Ida, Y., Kanai, S., Fujiwara, Y., Iwata, T., Takeuchi, K., Kashima, H.: Fast deterministic cur matrix decomposition with accuracy assurance. In: International Conference on Machine Learning, pp. 4594–4603 (2020). PMLR – reference: Ban, F., Bhattiprolu, V., Bringmann, K., Kolev, P., Lee, E., Woodruff, D.P.: A ptas for ℓp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _p$$\end{document}-low rank approximation. In: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 747–766 (2019). SIAM – reference: DeshpandeARademacherLVempalaSSWangGMatrix approximation and projective clustering via volume samplingTheory Comput.2006212225247232287910.4086/toc.2006.v002a0121213.68702 – reference: Dan, C., Wang, H., Zhang, H., Zhou, Y., Ravikumar, P.K.: Optimal analysis of subset-selection based ℓp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _p$$\end{document} low-rank approximation. Advances in Neural Information Processing Systems 32 (2019) |
| SSID | ssj0012796 |
| Score | 2.369541 |
| Snippet | We consider the problem of subset selection for
ℓ
p
subspace approximation and (
k
,
p
)-clustering. Our aim is to efficiently find a
small
subset of data... We consider the problem of subset selection for ℓp subspace approximation and (k, p)-clustering. Our aim is to efficiently find a small subset of data points... |
| SourceID | proquest springer |
| SourceType | Aggregation Database Publisher |
| StartPage | 3144 |
| SubjectTerms | Adaptive sampling Algorithm Analysis and Problem Complexity Algorithms Approximation Clustering Computer Science Computer Systems Organization and Communication Networks Data points Data Structures and Information Theory Mathematical analysis Mathematics of Computing Optimization Subspaces Theory of Computation |
| Title | One-Pass Additive-Error Subset Selection for ℓp Subspace Approximation and (k, p)-Clustering |
| URI | https://link.springer.com/article/10.1007/s00453-023-01124-0 https://www.proquest.com/docview/2869269411 |
| Volume | 85 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVLSH databaseName: SpringerLink Journals customDbUrl: mediaType: online eissn: 1432-0541 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0012796 issn: 0178-4617 databaseCode: AFBBN dateStart: 19861101 isFulltext: true providerName: Library Specific Holdings – providerCode: PRVAVX databaseName: SpringerLINK - Czech Republic Consortium customDbUrl: eissn: 1432-0541 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0012796 issn: 0178-4617 databaseCode: AGYKE dateStart: 19970101 isFulltext: true titleUrlDefault: http://link.springer.com providerName: Springer Nature – providerCode: PRVAVX databaseName: SpringerLink Journals (ICM) customDbUrl: eissn: 1432-0541 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0012796 issn: 0178-4617 databaseCode: U2A dateStart: 19970101 isFulltext: true titleUrlDefault: http://www.springerlink.com/journals/ providerName: Springer Nature |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NSgMxEA7aXrxYf7FaSw4eFJrSTTf7c9yW1qJYBS3U05JssiCVbWm3IJ59A1_AZ_FRfBIn2d2KxYuQW8LuMJnMfMP8IXTGY-aCZZUEsIdPbOE4xIs8n3DXisElsh1h-hTcDJ3ByL4as3FeFLYost2LkKTR1KtiN40-dMxR5_-AVSLgqJdNv60SKgeXj9e9VfSAumYul548T2ww0XmxzN9f-QUt16Khxsj0K2hUkJfllkyay1Q0o9e1zo3_pX8HbeeoEweZmOyiDZXsoUox0QHnD3wfhbeJIneAp3EgpckqIr35fAonQL-oFN-bqTlwlRiwLv56e5-ZHfC7FQ50d_KXp6wUEvNE4vNJ4_NjdkG6z0vdjgGIPUCjfu-hOyD5CAYyo9RNCVNtwZiMo5bSxlz4KuawKLdAAj0pY8WZpC0mYAHWacMJDg5jZLu-cKmM24eolEwTdYRwFNmeYylhc0btmIMVVHHkW_AHBYrBi6qoVtxDmL-jRUg9xze1tlYVNQq2_myvei4b3obA29DwNmwd_-_4CdrSc-SzLL0aKqXzpToFtJGKOghXv9MZ1nMhq6PNEQ2-AVIBz5s |
| linkProvider | Springer Nature |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSgMxFA5SF7rxX6xWzcKFQgOdNJmf5VBaqrZVsIXuhmSSgCjT0k7BA3gDL-BZPIon8SWdqShuhOzymBm-ecn7Hu8PoQtheACWVRHgHhFh0vdJmIYREYFnwCVivnR9CvoDvztiN2M-LorC5mW2exmSdDf1qtjNsg8bc7T5P2CVCDjq67aBle2YP6LxKnZAAzeVy86dJwwMdFEq8_czfhDLX7FQZ2I6O2ir4IY4Xv7MXbSmsz20Xc5dwMUx3EfJXabJPbBeHCvlcn9IezabgATcAjrHD262DQCOgZHiz9e3qdsB71jj2PYQf3lcFixikSl8-VT_eJ9ekdbzwjZNgC87QKNOe9jqkmJQAplSGuSE66bkXJm0oa3JlZE2AhYVHuhJqJTRgiva4BIWMJImSAhw61IWRDKgyjQPUSWbZPoI4TRloe9pyQSnzAiwVdqkkQdv0HB8w7SKaiVeSaHt84SGfuQqYr0qqpcYfm-vOiM79BNAP3HoJ43j_4mfo43usN9LeteD2xO0aSe_L_PqaqiSzxb6FPhBLs-cOnwBtT2zNA |
| linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dSsMwFA6iIN74L06n5sILhYWtWfp3WebG_JsDHeyuJE0ConRlduAD-Aa-gM_io_gknqTt_MEbIXc5tOXLSc93OH8IHXPt-mBZJQHuERImPI8ESRAS7jsaXCLmCdun4Hrg9UfsYuyOv1Xx22z3KiRZ1DSYLk1p3sykbs4L3wwTMfFHkwsEFoqA077ETKME0OgRjeZxBOrbCV1mBj1hYKzLspm_n_GDZP6Ki1pz01tHqyVPxFFxsBtoQaWbaK2awYDLK7mF4ptUkSEwYBxJafOASHc6nYAE_BFUjm_tnBsAHwM7xR8vr5ndAU9Z4cj0E3--L4oXMU8lPnlovL9lp6TzODMNFODLttGo173r9Ek5NIFklPo5cVVbuK7USUsZ8ytCpTksyh3QmUBKrbgracsVsICdtEGCg4uXMD8UPpW6vYMW00mqdhFOEhZ4jhKMu5RpDnZL6SR04A0KrnKQ1FC9wisuNf8ppoEX2upYp4YaFYZf2_MuyRb9GNCPLfpxa-9_4kdoeXjWi6_OB5f7aMUMgS9S7OpoMZ_O1AFQhVwcWm34BEuut3A |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=One-Pass+Additive-Error+Subset+Selection+for+%E2%84%93p+Subspace+Approximation+and+%28k%2C+p%29-Clustering&rft.jtitle=Algorithmica&rft.au=Deshpande+Amit&rft.au=Pratap+Rameshwar&rft.date=2023-10-01&rft.pub=Springer+Nature+B.V&rft.issn=0178-4617&rft.eissn=1432-0541&rft.volume=85&rft.issue=10&rft.spage=3144&rft.epage=3167&rft_id=info:doi/10.1007%2Fs00453-023-01124-0&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0178-4617&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0178-4617&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0178-4617&client=summon |