Non-redundant data clustering

Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. In practice, this discovery process should avoid redundancies with existing knowledge about class structures or groupings, and reveal novel, previously unknown aspects of the data. In order to d...

Full description

Saved in:

Bibliographic Details
Published in	Knowledge and information systems Vol. 12; no. 1; pp. 1 - 24
Main Authors	Gondek, David, Hofmann, Thomas
Format	Journal Article
Language	English
Published	London Springer Nature B.V 01.05.2007
Subjects	Algebra Classification Cluster analysis Clustering Data analysis Data mining Information systems Knowledge discovery Lattices Methods Optimization algorithms Random variables Redundancy Set theory Studies
Online Access	Get full text
ISSN	0219-1377 0219-3116
DOI	10.1007/s10115-006-0009-7

Cover

More Information
Summary:	Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. In practice, this discovery process should avoid redundancies with existing knowledge about class structures or groupings, and reveal novel, previously unknown aspects of the data. In order to deal with this problem, we present an extension of the information bottleneck framework, called coordinated conditional information bottleneck, which takes negative relevance information into account by maximizing a conditional mutual information score subject to constraints. Algorithmically, one can apply an alternating optimization scheme that can be used in conjunction with different types of numeric and non-numeric attributes. We discuss extensions of the technique to the tasks of semi-supervised classification and enumeration of successive non-redundant clusterings. We present experimental results for applications in text mining and computer vision.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Feature-1
ISSN:	0219-1377 0219-3116
DOI:	10.1007/s10115-006-0009-7