High-Dimensional Cluster Analysis with the Masked EM Algorithm

Cluster analysis faces two problems in high dimensions: the “curse of dimensionality” that can lead to overfitting and poor generalization performance and the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, des...

Full description

Saved in:

Bibliographic Details
Published in	Neural computation Vol. 26; no. 11; pp. 2379 - 2394
Main Authors	Kadir, Shabnam N, Goodman, Dan F. M, Harris, Kenneth D
Format	Journal Article
Language	English
Published	One Rogers Street, Cambridge, MA 02142-1209, USA MIT Press 01.11.2014
Subjects	Action Potentials - physiology Algorithms Cluster Analysis Data points Humans Mathematical analysis Models, Neurological Models, Theoretical Neurons - physiology Rendering Sorting Spikes Vectors (mathematics)
Online Access	Get full text
ISSN	0899-7667 1530-888X 1530-888X
DOI	10.1162/NECO_a_00661

Cover

More Information
Summary:	Cluster analysis faces two problems in high dimensions: the “curse of dimensionality” that can lead to overfitting and poor generalization performance and the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of spike sorting for next-generation, high-channel-count neural probes. In this problem, only a small subset of features provides information about the cluster membership of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data and to real-world high-channel-count spike sorting data.
Bibliography:	November, 2014 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0899-7667 1530-888X 1530-888X
DOI:	10.1162/NECO_a_00661