Dealing with Distances and Transformations for Fuzzy C-Means Clustering of Compositional Data
Clustering techniques are based upon a dissimilarity or distance measure between objects and clusters. This paper focuses on the simplex space, whose elements—compositions—are subject to non-negativity and constant-sum constraints. Any data analysis involving compositions should fulfill two main pri...
Saved in:
| Published in | Journal of classification Vol. 29; no. 2; pp. 144 - 169 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
Springer-Verlag
01.07.2012
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0176-4268 1432-1343 |
| DOI | 10.1007/s00357-012-9105-4 |
Cover
| Summary: | Clustering techniques are based upon a dissimilarity or distance measure between objects and clusters. This paper focuses on the simplex space, whose elements—compositions—are subject to non-negativity and constant-sum constraints. Any data analysis involving compositions should fulfill two main principles: scale invariance and subcompositional coherence. Among fuzzy clustering methods, the FCM algorithm is broadly applied in a variety of fields, but it is not well-behaved when dealing with compositions. Here, the adequacy of different dissimilarities in the simplex, together with the behavior of the common log-ratio transformations, is discussed in the basis of compositional principles. As a result, a well-founded strategy for FCM clustering of compositions is suggested. Theoretical findings are accompanied by numerical evidence, and a detailed account of our proposal is provided. Finally, a case study is illustrated using a nutritional data set known in the clustering literature. |
|---|---|
| Bibliography: | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 0176-4268 1432-1343 |
| DOI: | 10.1007/s00357-012-9105-4 |