Combining multiple clusterings using evidence accumulation

We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble - a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of producing data partitions are: 1) applying differe...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 27; no. 6; pp. 835 - 850
Main Authors	Fred, Ana L.N., Jain, Anil K.
Format	Journal Article
Language	English
Published	Los Alamitos, CA IEEE 01.06.2005 IEEE Computer Society The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Agglomeration Algorithm design and analysis Algorithms Applied sciences Artificial Intelligence Breast Neoplasms - diagnosis Cluster Analysis cluster fusion cluster validity Clustering Clustering algorithms Clusters combining clustering partitions Computer science; control theory; systems Computer Simulation evidence accumulation Exact sciences and technology Feature extraction Humans Image Interpretation, Computer-Assisted - methods Information Storage and Retrieval - methods K-means algorithm Models, Biological Models, Statistical Mutual information Partitioning algorithms Partitions Pattern Recognition, Automated - methods Pattern recognition. Digital image processing. Computational geometry Representations robust clustering Robustness Shape Similarity single-link method Strategy Cluster analysis Partition Cluster K means algorithm cluster fusion cluster validity Validity K-means algorithm single-link method evidence accumulation robust clustering Mutual information combining clustering partitions
Online Access	Get full text
ISSN	0162-8828 2160-9292 1939-3539
DOI	10.1109/TPAMI.2005.113

Cover

More Information
Summary:	We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble - a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of producing data partitions are: 1) applying different clustering algorithms and 2) applying the same clustering algorithm with different values of parameters or initializations. Further, combinations of different data representations (feature spaces) and clustering algorithms can also provide a multitude of significantly different data partitionings. We propose a simple framework for extracting a consistent clustering, given the various partitions in a clustering ensemble. According to the EAC concept, each partition is viewed as an independent evidence of data organization, individual data partitions being combined, based on a voting mechanism, to generate a new n × n similarity matrix between the n patterns. The final data partition of the n patterns is obtained by applying a hierarchical agglomerative clustering algorithm on this matrix. We have developed a theoretical framework for the analysis of the proposed clustering combination strategy and its evaluation, based on the concept of mutual information between data partitions. Stability of the results is evaluated using bootstrapping techniques. A detailed discussion of an evidence accumulation-based clustering algorithm, using a split and merge strategy based on the k-means clustering algorithm, is presented. Experimental results of the proposed method on several synthetic and real data sets are compared with other combination strategies, and with individual clustering results produced by well-known clustering algorithms.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 content type line 23 ObjectType-Undefined-1 ObjectType-Feature-3
ISSN:	0162-8828 2160-9292 1939-3539
DOI:	10.1109/TPAMI.2005.113