Exploring and Comparing Clusterings of Multivariate Data Sets Using Persistent Homology

Clustering algorithms support exploratory data analysis by grouping inputs that share similar features. Especially the clustering of unlabelled data is said to be a fiendishly difficult problem, because users not only have to choose a suitable clustering algorithm but also a suitable number of clust...

Full description

Saved in:
Bibliographic Details
Published inComputer graphics forum Vol. 35; no. 3; pp. 81 - 90
Main Authors Rieck, B., Leitte, H.
Format Journal Article
LanguageEnglish
Published Oxford Blackwell Publishing Ltd 01.06.2016
Subjects
Online AccessGet full text
ISSN0167-7055
1467-8659
DOI10.1111/cgf.12884

Cover

More Information
Summary:Clustering algorithms support exploratory data analysis by grouping inputs that share similar features. Especially the clustering of unlabelled data is said to be a fiendishly difficult problem, because users not only have to choose a suitable clustering algorithm but also a suitable number of clusters. The known issues of existing clustering validity measures comprise instabilities in the presence of noise and restrictive assumptions about cluster shapes. In addition, they cannot evaluate individual clusters locally. We present a new measure for assessing and comparing different clusterings both on a global and on a local level. Our measure is based on the topological method of persistent homology, which is stable and unbiased towards cluster shapes. Based on our measure, we also describe a new visualization that displays similarities between different clusterings (using a global graph view) and supports their comparison on the individual cluster level (using a local glyph view). We demonstrate how our visualization helps detect different—but equally valid—clusterings of data sets from multiple application domains.
Bibliography:ArticleID:CGF12884
ark:/67375/WNG-SP5J9SLZ-J
istex:5CEB5409759A0D94F71A08B597079CA733F9E0E6
Supporting Information
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ISSN:0167-7055
1467-8659
DOI:10.1111/cgf.12884