Extending k-Means-Based Algorithms for Evolving Data Streams with Variable Number of Clusters

Many algorithms for clustering data streams based on the widely used k-Means have been proposed in the literature. Most of them assume that the number of clusters, k, is known and fixed a priori by the user. Aimed at relaxing this assumption, which is often unrealistic in practical applications, we...

Full description

Saved in:

Bibliographic Details
Published in	2011 Tenth International Conference on Machine Learning and Applications Vol. 2; pp. 14 - 19
Main Authors	de Andrade Silva, J., Hruschka, E. R.
Format	Conference Proceeding
Language	English
Published	IEEE 01.12.2011
Subjects	Approximation algorithms Clustering Clustering algorithms Data Stream Heuristic algorithms Indexes Machine learning algorithms Online Clustering Partitioning algorithms Prototypes
Online Access	Get full text
ISBN	9781457721342 1457721341
DOI	10.1109/ICMLA.2011.67

Cover

More Information
Summary:	Many algorithms for clustering data streams based on the widely used k-Means have been proposed in the literature. Most of them assume that the number of clusters, k, is known and fixed a priori by the user. Aimed at relaxing this assumption, which is often unrealistic in practical applications, we describe an algorithmic framework that allows estimating k automatically from data. We illustrate the potential of the proposed framework by using three state-of-the-art algorithms for clustering data streams - Stream LSearch, CluStream, and Stream KM++ - combined with two well-known algorithms for estimating the number of clusters, namely: Ordered Multiple Runs of k-Means (OMRk) and Bisecting k-Means (BkM). As an additional contribution, we experimentally compare the resulting algorithmic instantiations in both synthetic and real-world data streams. Analyses of statistical significance suggest that OMRk yields to the best data partitions, while BkM is more computationally efficient. Also, the combination of Stream KM++ with OMRk leads to the best trade-off between accuracy and efficiency.
ISBN:	9781457721342 1457721341
DOI:	10.1109/ICMLA.2011.67