Exploring and Comparing Clusterings of Multivariate Data Sets Using Persistent Homology
Clustering algorithms support exploratory data analysis by grouping inputs that share similar features. Especially the clustering of unlabelled data is said to be a fiendishly difficult problem, because users not only have to choose a suitable clustering algorithm but also a suitable number of clust...
Saved in:
| Published in | Computer graphics forum Vol. 35; no. 3; pp. 81 - 90 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
Oxford
Blackwell Publishing Ltd
01.06.2016
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0167-7055 1467-8659 |
| DOI | 10.1111/cgf.12884 |
Cover
| Summary: | Clustering algorithms support exploratory data analysis by grouping inputs that share similar features. Especially the clustering of unlabelled data is said to be a fiendishly difficult problem, because users not only have to choose a suitable clustering algorithm but also a suitable number of clusters. The known issues of existing clustering validity measures comprise instabilities in the presence of noise and restrictive assumptions about cluster shapes. In addition, they cannot evaluate individual clusters locally. We present a new measure for assessing and comparing different clusterings both on a global and on a local level. Our measure is based on the topological method of persistent homology, which is stable and unbiased towards cluster shapes. Based on our measure, we also describe a new visualization that displays similarities between different clusterings (using a global graph view) and supports their comparison on the individual cluster level (using a local glyph view). We demonstrate how our visualization helps detect different—but equally valid—clusterings of data sets from multiple application domains. |
|---|---|
| Bibliography: | ArticleID:CGF12884 ark:/67375/WNG-SP5J9SLZ-J istex:5CEB5409759A0D94F71A08B597079CA733F9E0E6 Supporting Information SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 0167-7055 1467-8659 |
| DOI: | 10.1111/cgf.12884 |