High-Dimensional Cluster Analysis with the Masked EM Algorithm
Cluster analysis faces two problems in high dimensions: the “curse of dimensionality” that can lead to overfitting and poor generalization performance and the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, des...
Saved in:
| Published in | Neural computation Vol. 26; no. 11; pp. 2379 - 2394 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
One Rogers Street, Cambridge, MA 02142-1209, USA
MIT Press
01.11.2014
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0899-7667 1530-888X 1530-888X |
| DOI | 10.1162/NECO_a_00661 |
Cover
| Summary: | Cluster analysis faces two problems in high dimensions: the “curse of dimensionality”
that can lead to overfitting and poor generalization performance and the sheer time taken
for conventional algorithms to process large amounts of high-dimensional data. We describe
a solution to these problems, designed for the application of spike sorting for
next-generation, high-channel-count neural probes. In this problem, only a small subset of
features provides information about the cluster membership of any one data vector, but
this informative feature subset is not the same for all data points, rendering classical
feature selection ineffective. We introduce a “masked EM” algorithm that allows accurate
and time-efficient clustering of up to millions of points in thousands of dimensions. We
demonstrate its applicability to synthetic data and to real-world high-channel-count spike
sorting data. |
|---|---|
| Bibliography: | November, 2014 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 0899-7667 1530-888X 1530-888X |
| DOI: | 10.1162/NECO_a_00661 |