A geometric analysis of subspace clustering with outliers

This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower-dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information a...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Soltanolkotabi, Mahdi, Candés, Emmanuel J
Format	Paper Journal Article
Language	English
Published	Ithaca Cornell University Library, arXiv.org 30.01.2013
Subjects	Algorithms Clustering Computer Science - Information Theory Computer Science - Learning Computer vision Data analysis Data points Mathematics - Information Theory Mathematics - Statistics Theory Outliers (statistics) Pattern recognition Statistics - Machine Learning Statistics - Theory Subspaces
Online Access	Get full text
ISSN	2331-8422
DOI	10.48550/arxiv.1112.4258

Cover

More Information
Summary:	This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower-dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information about their dimensions. We develop a novel geometric analysis of an algorithm named sparse subspace clustering (SSC) [In IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009 (2009) 2790-2797. IEEE], which significantly broadens the range of problems where it is provably effective. For instance, we show that SSC can recover multiple subspaces, each of dimension comparable to the ambient dimension. We also prove that SSC can correctly cluster data points even when the subspaces of interest intersect. Further, we develop an extension of SSC that succeeds when the data set is corrupted with possibly overwhelmingly many outliers. Underlying our analysis are clear geometric insights, which may bear on other sparse recovery problems. A numerical study complements our theoretical analysis and demonstrates the effectiveness of these methods.
Bibliography:	SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 IMS-AOS-AOS1034
ISSN:	2331-8422
DOI:	10.48550/arxiv.1112.4258