Incremental Clustering and Dynamic Information Retrieval

Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retri...

Full description

Saved in:
Bibliographic Details
Published inSIAM journal on computing Vol. 33; no. 6; pp. 1417 - 1440
Main Authors Charikar, Moses, Chekuri, Chandra, Feder, Tomas, Motwani, Rajeev
Format Journal Article
LanguageEnglish
Published Philadelphia, PA Society for Industrial and Applied Mathematics 01.01.2004
Subjects
Online AccessGet full text
ISSN0097-5397
1095-7111
DOI10.1137/S0097539702418498

Cover

More Information
Summary:Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retrieval application, and which should also be useful in other applications. The goal is to efficiently maintain clusters of small diameter as new points are inserted. We analyze several natural greedy algorithms and demonstrate that they perform poorly. We propose new deterministic and randomized incremental clustering algorithms which have a provably good performance, and which we believe should also perform well in practice. We complement our positive results with lower bounds on the performance of incremental algorithms. Finally, we consider the dual clustering problem where the clusters are of fixed diameter, and the goal is to minimize the number of clusters.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
content type line 14
ISSN:0097-5397
1095-7111
DOI:10.1137/S0097539702418498