One-Pass Additive-Error Subset Selection for ℓp Subspace Approximation and (k, p)-Clustering

We consider the problem of subset selection for ℓ p subspace approximation and ( k ,  p )-clustering. Our aim is to efficiently find a small subset of data points such that solving the problem optimally for this subset gives a good approximation to solving the problem optimally for the original inpu...

Full description

Saved in:
Bibliographic Details
Published inAlgorithmica Vol. 85; no. 10; pp. 3144 - 3167
Main Authors Deshpande, Amit, Pratap, Rameshwar
Format Journal Article
LanguageEnglish
Published New York Springer US 01.10.2023
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN0178-4617
1432-0541
DOI10.1007/s00453-023-01124-0

Cover

Abstract We consider the problem of subset selection for ℓ p subspace approximation and ( k ,  p )-clustering. Our aim is to efficiently find a small subset of data points such that solving the problem optimally for this subset gives a good approximation to solving the problem optimally for the original input. For ℓ p subspace approximation, previously known subset selection algorithms based on volume sampling and adaptive sampling proposed in Deshpande and Varadarajan (STOC’07, 2007), for the general case of p ∈ [ 1 , ∞ ) , require multiple passes over the data. In this paper, we give a one-pass subset selection with an additive approximation guarantee for ℓ p subspace approximation, for any p ∈ [ 1 , ∞ ) . Earlier subset selection algorithms that give a one-pass multiplicative ( 1 + ϵ ) approximation work under the special cases. Cohen et al. (SODA’17, 2017) gives a one-pass subset section that offers multiplicative ( 1 + ϵ ) approximation guarantee for the special case of ℓ 2 subspace approximation. Mahabadi et al. (STOC’20, 2020) gives a one-pass noisy subset selection with ( 1 + ϵ ) approximation guarantee for ℓ p subspace approximation when p ∈ { 1 , 2 } . Our subset selection algorithm gives a weaker, additive approximation guarantee, but it works for any p ∈ [ 1 , ∞ ) . We also focus on ( k ,  p )-clustering, where the task is to group the data points into k clusters such that the sum of distances from points to cluster centers (raised to the power p ) is minimized for p ∈ [ 1 , ∞ ) . The subset selection algorithms are based on D p sampling due to Wei (NIPS’16, 2016) which is an extension of D 2 sampling proposed in Arthur and Vassilvitskii (SODA’07, 2007). Due to the inherently adaptive nature of the D p sampling, these algorithms require taking multiple passes over the input. In this work, we suggest one pass subset selection for ( k ,  p )-clustering that gives constant factor approximation with respect to the optimal solution with an additive approximation guarantee. Bachem et al. (NIPS’16, 2016) also gives one pass subset selection for k -means for p = 2 ; however, our result gives a solution for a more generic problem when p ∈ [ 1 , ∞ ) . At the core, our contribution lies in showing a one-pass MCMC-based subset selection algorithm such that its cost incurred due to the sampled points closely approximates the corresponding optimal cost, with high probability.
AbstractList We consider the problem of subset selection for ℓp subspace approximation and (k, p)-clustering. Our aim is to efficiently find a small subset of data points such that solving the problem optimally for this subset gives a good approximation to solving the problem optimally for the original input. For ℓp subspace approximation, previously known subset selection algorithms based on volume sampling and adaptive sampling proposed in Deshpande and Varadarajan (STOC’07, 2007), for the general case of p∈[1,∞), require multiple passes over the data. In this paper, we give a one-pass subset selection with an additive approximation guarantee for ℓp subspace approximation, for any p∈[1,∞). Earlier subset selection algorithms that give a one-pass multiplicative (1+ϵ) approximation work under the special cases. Cohen et al. (SODA’17, 2017) gives a one-pass subset section that offers multiplicative (1+ϵ) approximation guarantee for the special case of ℓ2 subspace approximation. Mahabadi et al. (STOC’20, 2020) gives a one-pass noisy subset selection with (1+ϵ) approximation guarantee for ℓp subspace approximation when p∈{1,2}. Our subset selection algorithm gives a weaker, additive approximation guarantee, but it works for any p∈[1,∞). We also focus on (k, p)-clustering, where the task is to group the data points into k clusters such that the sum of distances from points to cluster centers (raised to the power p) is minimized for p∈[1,∞). The subset selection algorithms are based on Dp sampling due to Wei (NIPS’16, 2016) which is an extension of D2 sampling proposed in Arthur and Vassilvitskii (SODA’07, 2007). Due to the inherently adaptive nature of the Dp sampling, these algorithms require taking multiple passes over the input. In this work, we suggest one pass subset selection for (k, p)-clustering that gives constant factor approximation with respect to the optimal solution with an additive approximation guarantee. Bachem et al. (NIPS’16, 2016) also gives one pass subset selection for k-means for p=2; however, our result gives a solution for a more generic problem when p∈[1,∞). At the core, our contribution lies in showing a one-pass MCMC-based subset selection algorithm such that its cost incurred due to the sampled points closely approximates the corresponding optimal cost, with high probability.
We consider the problem of subset selection for ℓ p subspace approximation and ( k ,  p )-clustering. Our aim is to efficiently find a small subset of data points such that solving the problem optimally for this subset gives a good approximation to solving the problem optimally for the original input. For ℓ p subspace approximation, previously known subset selection algorithms based on volume sampling and adaptive sampling proposed in Deshpande and Varadarajan (STOC’07, 2007), for the general case of p ∈ [ 1 , ∞ ) , require multiple passes over the data. In this paper, we give a one-pass subset selection with an additive approximation guarantee for ℓ p subspace approximation, for any p ∈ [ 1 , ∞ ) . Earlier subset selection algorithms that give a one-pass multiplicative ( 1 + ϵ ) approximation work under the special cases. Cohen et al. (SODA’17, 2017) gives a one-pass subset section that offers multiplicative ( 1 + ϵ ) approximation guarantee for the special case of ℓ 2 subspace approximation. Mahabadi et al. (STOC’20, 2020) gives a one-pass noisy subset selection with ( 1 + ϵ ) approximation guarantee for ℓ p subspace approximation when p ∈ { 1 , 2 } . Our subset selection algorithm gives a weaker, additive approximation guarantee, but it works for any p ∈ [ 1 , ∞ ) . We also focus on ( k ,  p )-clustering, where the task is to group the data points into k clusters such that the sum of distances from points to cluster centers (raised to the power p ) is minimized for p ∈ [ 1 , ∞ ) . The subset selection algorithms are based on D p sampling due to Wei (NIPS’16, 2016) which is an extension of D 2 sampling proposed in Arthur and Vassilvitskii (SODA’07, 2007). Due to the inherently adaptive nature of the D p sampling, these algorithms require taking multiple passes over the input. In this work, we suggest one pass subset selection for ( k ,  p )-clustering that gives constant factor approximation with respect to the optimal solution with an additive approximation guarantee. Bachem et al. (NIPS’16, 2016) also gives one pass subset selection for k -means for p = 2 ; however, our result gives a solution for a more generic problem when p ∈ [ 1 , ∞ ) . At the core, our contribution lies in showing a one-pass MCMC-based subset selection algorithm such that its cost incurred due to the sampled points closely approximates the corresponding optimal cost, with high probability.
Author Deshpande, Amit
Pratap, Rameshwar
Author_xml – sequence: 1
  givenname: Amit
  surname: Deshpande
  fullname: Deshpande, Amit
  organization: Microsoft Research, Bangalore
– sequence: 2
  givenname: Rameshwar
  surname: Pratap
  fullname: Pratap, Rameshwar
  email: rameshwar@cse.iith.ac.in
  organization: Department of Computer Science and Engineering, IIT Hyderabad
BookMark eNpFkF9Kw0AQxhepYFu9gE8BXxRcnf2XbR5LqX-goFB9XjbZiaTWTcwm4gG8gRfwLB7Fk7htBWGGgfk-Zj5-IzLwtUdCjhlcMAB9GQCkEhR4bMa4pLBHhkwKTkFJNiBDYHpCZcr0ARmFsAJgXGfpkJg7j_TehpBMnau66g3pvG3rNln2ecAuWeIai66qfVLG5c_HZ7NVGltgMm2atn6vXuxWt94lp8_n31_NGZ2t-9BhW_mnQ7Jf2nXAo785Jo9X84fZDV3cXd_OpgvacK47qlDkSrmyAAQueZ5haWNxy9C6iXMlWuU4qDxWKjIRHRYyWUid5Zq7UozJye5ujPTaY-jMqu5bH18aPkkznmaSsegSO1doNuGw_XcxMBuSZkfSRJJmS9KA-AWZZ2p8
ContentType Journal Article
Copyright The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Copyright_xml – notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
DBID JQ2
DOI 10.1007/s00453-023-01124-0
DatabaseName ProQuest Computer Science Collection
DatabaseTitle ProQuest Computer Science Collection
DatabaseTitleList ProQuest Computer Science Collection

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1432-0541
EndPage 3167
ExternalDocumentID 10_1007_s00453_023_01124_0
GroupedDBID -4Z
-59
-5G
-BR
-EM
-Y2
-~C
-~X
.86
.DC
.VR
06D
0R~
0VY
199
1N0
1SB
203
23M
28-
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
4.4
406
408
409
40D
40E
5GY
5QI
5VS
67Z
6NX
78A
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AAOBN
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDPE
ABDZT
ABECU
ABFSI
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABLJU
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTAH
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACZOJ
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AI.
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
B-.
BA0
BBWZM
BDATZ
BGNMA
BSONS
CAG
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
E.L
EBLON
EBS
EIOEI
EJD
ESBYG
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ6
GQ7
GQ8
GXS
H13
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
H~9
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
KDC
KOV
KOW
LAS
LLZTM
M4Y
MA-
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
P19
P9O
PF-
PT4
PT5
QOK
QOS
R4E
R89
R9I
RHV
RIG
RNI
RNS
ROL
RPX
RSV
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SDM
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TN5
TSG
TSK
TSV
TUC
U2A
UG4
UOJIU
UQL
UTJUX
UZXMN
VC2
VFIZW
VH1
VXZ
W23
W48
WK8
YLTOR
Z45
Z7X
Z83
Z88
Z8R
Z8W
Z92
ZMTXR
ZY4
~EX
AAPKM
ABBRH
ABDBE
ABFSG
ABRTQ
ACSTC
AEZWR
AFDZB
AFHIU
AFOHR
AHPBZ
AHWEU
AIXLP
ATHPR
AYFIA
JQ2
ID FETCH-LOGICAL-p227t-5e3b55dfc0e0242b9efaefa2a1ead8ddfea5d205b05b63932b9a094c479b72df3
IEDL.DBID AGYKE
ISSN 0178-4617
IngestDate Thu Oct 02 16:37:19 EDT 2025
Fri Feb 21 02:41:44 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 10
Keywords Subspace approximation
Subset selection
means clustering
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-p227t-5e3b55dfc0e0242b9efaefa2a1ead8ddfea5d205b05b63932b9a094c479b72df3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
PQID 2869269411
PQPubID 2043795
PageCount 24
ParticipantIDs proquest_journals_2869269411
springer_journals_10_1007_s00453_023_01124_0
PublicationCentury 2000
PublicationDate 2023-10-01
PublicationDateYYYYMMDD 2023-10-01
PublicationDate_xml – month: 10
  year: 2023
  text: 2023-10-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle Algorithmica
PublicationTitleAbbrev Algorithmica
PublicationYear 2023
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References Ghashami, M., Liberty, E., Phillips, J.M., Woodruff, D.P.: Frequent directions : Simple and deterministic matrix sketching. CoRR abs/1501.01711 (2015) arXiv:1501.01711
Deshpande, A., Varadarajan, K.: Sampling-based dimension reduction for subspace approximation. In: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing. STOC ’07, pp. 641–650. Association for Computing Machinery, New York, (2007). https://doi.org/10.1145/1250790.1250884
Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive sampling for k-means clustering, pp. 15–28 (2009). https://doi.org/10.1007/978-3-642-03685-9_2
LloydSLeast squares quantization in pcmIEEE Trans. Inf. Theor.200628212913765180710.1109/TIT.1982.10564890504.94015
Ida, Y., Kanai, S., Fujiwara, Y., Iwata, T., Takeuchi, K., Kashima, H.: Fast deterministic cur matrix decomposition with accuracy assurance. In: International Conference on Machine Learning, pp. 4594–4603 (2020). PMLR
CHAO, M.T.: A general purpose unequal probability sampling plan. Biometrika 69(3), 653–656 (1982) https://academic.oup.com/biomet/article-pdf/69/3/653/591311/69-3-653.pdf. https://doi.org/10.1093/biomet/69.3.653
Boutsidis, C., Mahoney, M.W., Drineas, P.: An improved approximation algorithm for the column subset selection problem. In: Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 968–977 (2009). SIAM
Ailon, N., Jaiswal, R., Monteleoni, C.: Streaming k-means approximation. Adv Neural Inform Process Syst22 (2009)
Liberty, E.: Simple and deterministic matrix sketching. In:Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’13, pp. 581–588. Association for Computing Machinery, New York, NY, USA (2013). https://doi.org/10.1145/2487575.2487623
Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Fast and provably good seedings for k-means. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16, pp. 55–63. Curran Associates Inc., Red Hook (2016)
Deshpande, A., Vempala, S.: Adaptive sampling and fast low-rank matrix approximation. In: Proceedings of the 9th International Conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th International Conference on Randomization and Computation. APPROX’06/RANDOM’06, pp. 292–303. Springer, Berlin, Heidelberg (2006). https://doi.org/10.1007/11830924_28
Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Approximate k-means++ in sublinear time. Proceedings of the AAAI Conference on Artificial Intelligence 30(1) (2016). https://doi.org/10.1609/aaai.v30i1.10259
Guruswami, V., Sinop, A.K.: Optimal column-based low-rank matrix reconstruction. In: Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’12, pp. 1207–1214. Society for Industrial and Applied Mathematics, USA (2012)
Cohen, M.B., Musco, C., Musco, C.: Input sparsity time low-rank approximation via ridge leverage score sampling. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’17, pp. 1758–1777. Society for Industrial and Applied Mathematics, USA (2017)
Wei, K., Iyer, R., Bilmes, J.: Submodularity in data subset selection and active learning. In: Proceedings of the 32nd International Conference on Machine Learning, vol. 37, pp. 1954–1963 (2015)
Dan, C., Wang, H., Zhang, H., Zhou, Y., Ravikumar, P.K.: Optimal analysis of subset-selection based ℓp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _p$$\end{document} low-rank approximation. Advances in Neural Information Processing Systems 32 (2019)
Cormode, G., Dickens, C., Woodruff, D.: Leveraging well-conditioned bases: Streaming and distributed summaries in minkowski p-norms. In: International Conference on Machine Learning, pp. 1048–1056 (2018)
Wei, D.: A constant-factor bi-criteria approximation guarantee for k-means++. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16, pp. 604–612. Curran Associates Inc., Red Hook (2016)
Efraimidis, P.S., Spirakis, P.P.: Weighted random sampling. In: Encyclopedia of Algorithms, pp. 2365–2367 (2016). https://doi.org/10.1007/978-1-4939-2864-4_478
Feldman, D.: Introduction to Core-sets: an Updated Survey (2020)
FriezeAMKannanRVempalaSSFast monte-carlo algorithms for finding low-rank approximationsJ. ACM200451610251041214526210.1145/1039488.10394941125.65005
Arthur, D., Vassilvitskii, S.: K-means++: The advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’07, pp. 1027–1035. Society for Industrial and Applied Mathematics, USA (2007)
Chierichetti, F., Gollapudi, S., Kumar, R., Lattanzi, S., Panigrahy, R., Woodruff, D.P.: Algorithms for ℓp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _p$$\end{document} low-rank approximation. In: International Conference on Machine Learning, pp. 806–814 (2017). PMLR
Mahoney, M.W., Maggioni, M., Drineas, P.: Tensor-cur decompositions for tensor-based data. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 327–336 (2006)
AloiseDDeshpandeAHansenPPopatPNp-hardness of euclidean sum-of-squares clusteringMach. Learn.200975224524810.1007/s10994-009-5103-01378.68047
WangSZhangZImproving CUR matrix decomposition and the nyström approximation via adaptive samplingJ. Mach. Learn. Res.20131412729276931216561318.65023
Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1+ϵ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1+\epsilon )$$\end{document}-approximation algorithm for k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}-means clustering in any dimensions. In: 45th Annual IEEE Symposium on Foundations of Computer Science, pp. 454–462 (2004). IEEE
CaiHExact bound for the convergence of metropolis chainsStochastic Anal Appl20001816371173929610.1080/073629900088096540978.60080
MahajanMNimbhorkarPVaradarajanKRThe planar k-means problem is np-hardTheor. Comput. Sci.20124421321292709710.1016/j.tcs.2010.05.0341260.68158
DeshpandeARademacherLVempalaSSWangGMatrix approximation and projective clustering via volume samplingTheory Comput.2006212225247232287910.4086/toc.2006.v002a0121213.68702
Clarkson, K.L., Woodruff, D.P.: Low-rank approximation and regression in input sparsity time. J. ACM 63(6) (2017). https://doi.org/10.1145/3019134
DrineasPMahoneyMWMuthukrishnanSRelative-error CUR matrix decompositionsSIAM J. Matrix Anal. Appl.2008302844881244397510.1137/07070471X1183.68738
Broadbent, M.E., Brown, M., Penner, K., Ipsen, I., Rehman, R.: Subset selection algorithms: Randomized vs. deterministic. SIAM Undergraduate Research Online 3(01) (2010)
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012). https://doi.org/10.14778/2180912.2180915
Feldman, D., Monemizadeh, M., Sohler, C., Woodruff, D.P.: Coresets and sketches for high dimensional subspace approximation problems. In: Proceedings of the Twenty-first Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 630–649 (2010). SIAM
Ghashami, M., Phillips, J.M.: Relative errors for deterministic low-rank matrix approximations. In:Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’14, pp. 707–717. Society for Industrial and Applied Mathematics, USA (2014)
MahoneyMWRandomized algorithms for matrices and dataFound. Trends Mach. Learn.20113212322410.1561/22000000351232.68173
VitterJSRandom sampling with a reservoirACM Trans. Math. Softw.1985111375779305610.1145/3147.31650562.68028
Drineas, P., Kerenidis, I., Raghavan, P.: Competitive recommendation systems. In: Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing, pp. 82–90 (2002)
GhashamiMLibertyEPhillipsJMWoodruffDPFrequent directions: Simple and deterministic matrix sketchingSIAM J. Comput.201645517621792354698210.1137/15M10097181348.65075
MahoneyMWDrineasPCUR matrix decompositions for improved data analysisProc. Natl. Acad. Sci. USA20091063697702247579510.1073/pnas.08032051061202.68480
Sun, J., Xie, Y., Zhang, H., Faloutsos, C.: Less is more: Compact matrix decomposition for large sparse graphs. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 366–377 (2007). SIAM
JaiswalRKumarASenSA simple D 2-sampling based PTAS for k-means and other clustering problemsAlgorithmica20147012246322966610.1007/s00453-013-9833-91364.68369
JaiswalRKumarMYadavPImproved analysis of d2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$^{\text{2}}$$\end{document}-sampling based PTAS for k-means and other clustering problemsInf. Process. Lett.20151152100103327928810.1016/j.ipl.2014.07.0091302.68341
Anari, N., Gharan, S.O., Rezaei, A.: Monte carlo markov chain algorithms for sampling strongly rayleigh distributions and determinantal point processes. In: Conference on Learning Theory, pp. 103–115 (2016). PMLR
Derezinski, M., Warmuth, M.K.: Unbiased estimates for linear regression via volume sampling. Advances in Neural Information Processing Systems 30 (2017)
Mahabadi, S., Razenshteyn, I., Woodru
References_xml – reference: VitterJSRandom sampling with a reservoirACM Trans. Math. Softw.1985111375779305610.1145/3147.31650562.68028
– reference: Deshpande, A., Vempala, S.: Adaptive sampling and fast low-rank matrix approximation. In: Proceedings of the 9th International Conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th International Conference on Randomization and Computation. APPROX’06/RANDOM’06, pp. 292–303. Springer, Berlin, Heidelberg (2006). https://doi.org/10.1007/11830924_28
– reference: Wei, K., Iyer, R., Bilmes, J.: Submodularity in data subset selection and active learning. In: Proceedings of the 32nd International Conference on Machine Learning, vol. 37, pp. 1954–1963 (2015)
– reference: Chierichetti, F., Gollapudi, S., Kumar, R., Lattanzi, S., Panigrahy, R., Woodruff, D.P.: Algorithms for ℓp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _p$$\end{document} low-rank approximation. In: International Conference on Machine Learning, pp. 806–814 (2017). PMLR
– reference: Feldman, D.: Introduction to Core-sets: an Updated Survey (2020)
– reference: FriezeAMKannanRVempalaSSFast monte-carlo algorithms for finding low-rank approximationsJ. ACM200451610251041214526210.1145/1039488.10394941125.65005
– reference: Mahabadi, S., Razenshteyn, I., Woodruff, D.P., Zhou, S.: Non-adaptive adaptive sampling on turnstile streams. In: Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing. STOC 2020, pp. 1251–1264. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3357713.3384331
– reference: Feldman, D., Monemizadeh, M., Sohler, C., Woodruff, D.P.: Coresets and sketches for high dimensional subspace approximation problems. In: Proceedings of the Twenty-first Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 630–649 (2010). SIAM
– reference: Ailon, N., Jaiswal, R., Monteleoni, C.: Streaming k-means approximation. Adv Neural Inform Process Syst22 (2009)
– reference: Derezinski, M., Warmuth, M.K.: Unbiased estimates for linear regression via volume sampling. Advances in Neural Information Processing Systems 30 (2017)
– reference: Arthur, D., Vassilvitskii, S.: K-means++: The advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’07, pp. 1027–1035. Society for Industrial and Applied Mathematics, USA (2007)
– reference: CaiHExact bound for the convergence of metropolis chainsStochastic Anal Appl20001816371173929610.1080/073629900088096540978.60080
– reference: WangSZhangZImproving CUR matrix decomposition and the nyström approximation via adaptive samplingJ. Mach. Learn. Res.20131412729276931216561318.65023
– reference: Efraimidis, P.S., Spirakis, P.P.: Weighted random sampling. In: Encyclopedia of Algorithms, pp. 2365–2367 (2016). https://doi.org/10.1007/978-1-4939-2864-4_478
– reference: LloydSLeast squares quantization in pcmIEEE Trans. Inf. Theor.200628212913765180710.1109/TIT.1982.10564890504.94015
– reference: Cormode, G., Dickens, C., Woodruff, D.: Leveraging well-conditioned bases: Streaming and distributed summaries in minkowski p-norms. In: International Conference on Machine Learning, pp. 1048–1056 (2018)
– reference: Mahoney, M.W., Maggioni, M., Drineas, P.: Tensor-cur decompositions for tensor-based data. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 327–336 (2006)
– reference: Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012). https://doi.org/10.14778/2180912.2180915
– reference: Wei, D.: A constant-factor bi-criteria approximation guarantee for k-means++. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16, pp. 604–612. Curran Associates Inc., Red Hook (2016)
– reference: Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Fast and provably good seedings for k-means. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16, pp. 55–63. Curran Associates Inc., Red Hook (2016)
– reference: MahoneyMWDrineasPCUR matrix decompositions for improved data analysisProc. Natl. Acad. Sci. USA20091063697702247579510.1073/pnas.08032051061202.68480
– reference: JaiswalRKumarASenSA simple D 2-sampling based PTAS for k-means and other clustering problemsAlgorithmica20147012246322966610.1007/s00453-013-9833-91364.68369
– reference: Broadbent, M.E., Brown, M., Penner, K., Ipsen, I., Rehman, R.: Subset selection algorithms: Randomized vs. deterministic. SIAM Undergraduate Research Online 3(01) (2010)
– reference: Sun, J., Xie, Y., Zhang, H., Faloutsos, C.: Less is more: Compact matrix decomposition for large sparse graphs. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 366–377 (2007). SIAM
– reference: Ghashami, M., Liberty, E., Phillips, J.M., Woodruff, D.P.: Frequent directions : Simple and deterministic matrix sketching. CoRR abs/1501.01711 (2015) arXiv:1501.01711
– reference: Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Approximate k-means++ in sublinear time. Proceedings of the AAAI Conference on Artificial Intelligence 30(1) (2016). https://doi.org/10.1609/aaai.v30i1.10259
– reference: MahoneyMWRandomized algorithms for matrices and dataFound. Trends Mach. Learn.20113212322410.1561/22000000351232.68173
– reference: Liberty, E.: Simple and deterministic matrix sketching. In:Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’13, pp. 581–588. Association for Computing Machinery, New York, NY, USA (2013). https://doi.org/10.1145/2487575.2487623
– reference: Ghashami, M., Phillips, J.M.: Relative errors for deterministic low-rank matrix approximations. In:Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’14, pp. 707–717. Society for Industrial and Applied Mathematics, USA (2014)
– reference: Braverman, V., Drineas, P., Musco, C., Musco, C., Upadhyay, J., Woodruff, D.P., Zhou, S.: Near optimal linear algebra in the online and sliding window models. In: 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pp. 517–528 (2020). IEEE
– reference: Boutsidis, C., Mahoney, M.W., Drineas, P.: An improved approximation algorithm for the column subset selection problem. In: Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 968–977 (2009). SIAM
– reference: AloiseDDeshpandeAHansenPPopatPNp-hardness of euclidean sum-of-squares clusteringMach. Learn.200975224524810.1007/s10994-009-5103-01378.68047
– reference: Anari, N., Gharan, S.O., Rezaei, A.: Monte carlo markov chain algorithms for sampling strongly rayleigh distributions and determinantal point processes. In: Conference on Learning Theory, pp. 103–115 (2016). PMLR
– reference: Guruswami, V., Sinop, A.K.: Optimal column-based low-rank matrix reconstruction. In: Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’12, pp. 1207–1214. Society for Industrial and Applied Mathematics, USA (2012)
– reference: Cohen, M.B., Musco, C., Musco, C.: Input sparsity time low-rank approximation via ridge leverage score sampling. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’17, pp. 1758–1777. Society for Industrial and Applied Mathematics, USA (2017)
– reference: MahajanMNimbhorkarPVaradarajanKRThe planar k-means problem is np-hardTheor. Comput. Sci.20124421321292709710.1016/j.tcs.2010.05.0341260.68158
– reference: DrineasPMahoneyMWMuthukrishnanSRelative-error CUR matrix decompositionsSIAM J. Matrix Anal. Appl.2008302844881244397510.1137/07070471X1183.68738
– reference: Deshpande, A., Varadarajan, K.: Sampling-based dimension reduction for subspace approximation. In: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing. STOC ’07, pp. 641–650. Association for Computing Machinery, New York, (2007). https://doi.org/10.1145/1250790.1250884
– reference: Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive sampling for k-means clustering, pp. 15–28 (2009). https://doi.org/10.1007/978-3-642-03685-9_2
– reference: Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1+ϵ)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1+\epsilon )$$\end{document}-approximation algorithm for k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}-means clustering in any dimensions. In: 45th Annual IEEE Symposium on Foundations of Computer Science, pp. 454–462 (2004). IEEE
– reference: Drineas, P., Kerenidis, I., Raghavan, P.: Competitive recommendation systems. In: Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing, pp. 82–90 (2002)
– reference: GhashamiMLibertyEPhillipsJMWoodruffDPFrequent directions: Simple and deterministic matrix sketchingSIAM J. Comput.201645517621792354698210.1137/15M10097181348.65075
– reference: CHAO, M.T.: A general purpose unequal probability sampling plan. Biometrika 69(3), 653–656 (1982) https://academic.oup.com/biomet/article-pdf/69/3/653/591311/69-3-653.pdf. https://doi.org/10.1093/biomet/69.3.653
– reference: JaiswalRKumarMYadavPImproved analysis of d2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$^{\text{2}}$$\end{document}-sampling based PTAS for k-means and other clustering problemsInf. Process. Lett.20151152100103327928810.1016/j.ipl.2014.07.0091302.68341
– reference: Clarkson, K.L., Woodruff, D.P.: Low-rank approximation and regression in input sparsity time. J. ACM 63(6) (2017). https://doi.org/10.1145/3019134
– reference: Ida, Y., Kanai, S., Fujiwara, Y., Iwata, T., Takeuchi, K., Kashima, H.: Fast deterministic cur matrix decomposition with accuracy assurance. In: International Conference on Machine Learning, pp. 4594–4603 (2020). PMLR
– reference: Ban, F., Bhattiprolu, V., Bringmann, K., Kolev, P., Lee, E., Woodruff, D.P.: A ptas for ℓp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _p$$\end{document}-low rank approximation. In: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 747–766 (2019). SIAM
– reference: DeshpandeARademacherLVempalaSSWangGMatrix approximation and projective clustering via volume samplingTheory Comput.2006212225247232287910.4086/toc.2006.v002a0121213.68702
– reference: Dan, C., Wang, H., Zhang, H., Zhou, Y., Ravikumar, P.K.: Optimal analysis of subset-selection based ℓp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _p$$\end{document} low-rank approximation. Advances in Neural Information Processing Systems 32 (2019)
SSID ssj0012796
Score 2.369541
Snippet We consider the problem of subset selection for ℓ p subspace approximation and ( k ,  p )-clustering. Our aim is to efficiently find a small subset of data...
We consider the problem of subset selection for ℓp subspace approximation and (k, p)-clustering. Our aim is to efficiently find a small subset of data points...
SourceID proquest
springer
SourceType Aggregation Database
Publisher
StartPage 3144
SubjectTerms Adaptive sampling
Algorithm Analysis and Problem Complexity
Algorithms
Approximation
Clustering
Computer Science
Computer Systems Organization and Communication Networks
Data points
Data Structures and Information Theory
Mathematical analysis
Mathematics of Computing
Optimization
Subspaces
Theory of Computation
Title One-Pass Additive-Error Subset Selection for ℓp Subspace Approximation and (k, p)-Clustering
URI https://link.springer.com/article/10.1007/s00453-023-01124-0
https://www.proquest.com/docview/2869269411
Volume 85
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVLSH
  databaseName: SpringerLink Journals
  customDbUrl:
  mediaType: online
  eissn: 1432-0541
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0012796
  issn: 0178-4617
  databaseCode: AFBBN
  dateStart: 19861101
  isFulltext: true
  providerName: Library Specific Holdings
– providerCode: PRVAVX
  databaseName: SpringerLINK - Czech Republic Consortium
  customDbUrl:
  eissn: 1432-0541
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0012796
  issn: 0178-4617
  databaseCode: AGYKE
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: http://link.springer.com
  providerName: Springer Nature
– providerCode: PRVAVX
  databaseName: SpringerLink Journals (ICM)
  customDbUrl:
  eissn: 1432-0541
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0012796
  issn: 0178-4617
  databaseCode: U2A
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: http://www.springerlink.com/journals/
  providerName: Springer Nature
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NSgMxEA7aXrxYf7FaSw4eFJrSTTf7c9yW1qJYBS3U05JssiCVbWm3IJ59A1_AZ_FRfBIn2d2KxYuQW8LuMJnMfMP8IXTGY-aCZZUEsIdPbOE4xIs8n3DXisElsh1h-hTcDJ3ByL4as3FeFLYost2LkKTR1KtiN40-dMxR5_-AVSLgqJdNv60SKgeXj9e9VfSAumYul548T2ww0XmxzN9f-QUt16Khxsj0K2hUkJfllkyay1Q0o9e1zo3_pX8HbeeoEweZmOyiDZXsoUox0QHnD3wfhbeJIneAp3EgpckqIr35fAonQL-oFN-bqTlwlRiwLv56e5-ZHfC7FQ50d_KXp6wUEvNE4vNJ4_NjdkG6z0vdjgGIPUCjfu-hOyD5CAYyo9RNCVNtwZiMo5bSxlz4KuawKLdAAj0pY8WZpC0mYAHWacMJDg5jZLu-cKmM24eolEwTdYRwFNmeYylhc0btmIMVVHHkW_AHBYrBi6qoVtxDmL-jRUg9xze1tlYVNQq2_myvei4b3obA29DwNmwd_-_4CdrSc-SzLL0aKqXzpToFtJGKOghXv9MZ1nMhq6PNEQ2-AVIBz5s
linkProvider Springer Nature
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSgMxFA5SF7rxX6xWzcKFQgOdNJmf5VBaqrZVsIXuhmSSgCjT0k7BA3gDL-BZPIon8SWdqShuhOzymBm-ecn7Hu8PoQtheACWVRHgHhFh0vdJmIYREYFnwCVivnR9CvoDvztiN2M-LorC5mW2exmSdDf1qtjNsg8bc7T5P2CVCDjq67aBle2YP6LxKnZAAzeVy86dJwwMdFEq8_czfhDLX7FQZ2I6O2ir4IY4Xv7MXbSmsz20Xc5dwMUx3EfJXabJPbBeHCvlcn9IezabgATcAjrHD262DQCOgZHiz9e3qdsB71jj2PYQf3lcFixikSl8-VT_eJ9ekdbzwjZNgC87QKNOe9jqkmJQAplSGuSE66bkXJm0oa3JlZE2AhYVHuhJqJTRgiva4BIWMJImSAhw61IWRDKgyjQPUSWbZPoI4TRloe9pyQSnzAiwVdqkkQdv0HB8w7SKaiVeSaHt84SGfuQqYr0qqpcYfm-vOiM79BNAP3HoJ43j_4mfo43usN9LeteD2xO0aSe_L_PqaqiSzxb6FPhBLs-cOnwBtT2zNA
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dSsMwFA6iIN74L06n5sILhYWtWfp3WebG_JsDHeyuJE0ConRlduAD-Aa-gM_io_gknqTt_MEbIXc5tOXLSc93OH8IHXPt-mBZJQHuERImPI8ESRAS7jsaXCLmCdun4Hrg9UfsYuyOv1Xx22z3KiRZ1DSYLk1p3sykbs4L3wwTMfFHkwsEFoqA077ETKME0OgRjeZxBOrbCV1mBj1hYKzLspm_n_GDZP6Ki1pz01tHqyVPxFFxsBtoQaWbaK2awYDLK7mF4ptUkSEwYBxJafOASHc6nYAE_BFUjm_tnBsAHwM7xR8vr5ndAU9Z4cj0E3--L4oXMU8lPnlovL9lp6TzODMNFODLttGo173r9Ek5NIFklPo5cVVbuK7USUsZ8ytCpTksyh3QmUBKrbgracsVsICdtEGCg4uXMD8UPpW6vYMW00mqdhFOEhZ4jhKMu5RpDnZL6SR04A0KrnKQ1FC9wisuNf8ppoEX2upYp4YaFYZf2_MuyRb9GNCPLfpxa-9_4kdoeXjWi6_OB5f7aMUMgS9S7OpoMZ_O1AFQhVwcWm34BEuut3A
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=One-Pass+Additive-Error+Subset+Selection+for+%E2%84%93p+Subspace+Approximation+and+%28k%2C+p%29-Clustering&rft.jtitle=Algorithmica&rft.au=Deshpande+Amit&rft.au=Pratap+Rameshwar&rft.date=2023-10-01&rft.pub=Springer+Nature+B.V&rft.issn=0178-4617&rft.eissn=1432-0541&rft.volume=85&rft.issue=10&rft.spage=3144&rft.epage=3167&rft_id=info:doi/10.1007%2Fs00453-023-01124-0&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0178-4617&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0178-4617&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0178-4617&client=summon