Stability estimation for unsupervised clustering: A review
Cluster analysis remains one of the most challenging yet fundamental tasks in unsupervised learning. This is due in part to the fact that there are no labels or gold standards by which performance can be measured. Moreover, the wide range of clustering methods available is governed by different obje...
        Saved in:
      
    
          | Published in | Wiley interdisciplinary reviews. Computational statistics Vol. 14; no. 6; pp. e1575 - n/a | 
|---|---|
| Main Authors | , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        Hoboken, USA
          John Wiley & Sons, Inc
    
        01.11.2022
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1939-5108 1939-0068 1939-0068  | 
| DOI | 10.1002/wics.1575 | 
Cover
| Summary: | Cluster analysis remains one of the most challenging yet fundamental tasks in unsupervised learning. This is due in part to the fact that there are no labels or gold standards by which performance can be measured. Moreover, the wide range of clustering methods available is governed by different objective functions, different parameters, and dissimilarity measures. The purpose of clustering is versatile, often playing critical roles in the early stages of exploratory data analysis and as an endpoint for knowledge and discovery. Thus, understanding the quality of a clustering is of critical importance. The concept of stability has emerged as a strategy for assessing the performance and reproducibility of data clustering. The key idea is to produce perturbed data sets that are very close to the original, and cluster them. If the clustering is stable, then the clusters from the original data will be preserved in the perturbed data clustering. The nature of the perturbation, and the methods for quantifying similarity between clusterings, are nontrivial, and ultimately what distinguishes many of the stability estimation methods apart. In this review, we provide an overview of the very active research area of cluster stability estimation and discuss some of the open questions and challenges that remain in the field.
This article is categorized under:
Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification
Grouping items into clusters is a complex problem in unsupervised learning with inherent uncertainty. Stability is a measurement that characterizes the strength and reproducibility of a cluster and an items membership to a cluster. | 
|---|---|
| Bibliography: | Funding Information Edited by Nicole Lazar, Commissioning Editor and David Scott, Co‐Editor‐in‐Chief Rachael Hageman Blair was supported by the NSF DMS 1557589. Han Yu was supported by the National Cancer Institute Cancer Center Support (Grant P30CA016056) and National Cancer Institute IOTN Moonshot (Grant U24CA232979). Correction added on 21 January 2022, after first online publication: The copyright line was changed.  | 
| ISSN: | 1939-5108 1939-0068 1939-0068  | 
| DOI: | 10.1002/wics.1575 |