K-sets and k-swaps algorithms for clustering sets
•Novel K-sets algorithm that generalizes k-means to work for sets data.•Novel K-swaps algorithm to avoid local minima.•Benchmark for evaluating clustering of sets data.•Case study with clustering patients based on their ICD-10 diagnoses. We present two new clustering algorithms called k-sets and k-s...
Saved in:
| Published in | Pattern recognition Vol. 139; p. 109454 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
Elsevier Ltd
01.07.2023
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0031-3203 1873-5142 1873-5142 |
| DOI | 10.1016/j.patcog.2023.109454 |
Cover
| Summary: | •Novel K-sets algorithm that generalizes k-means to work for sets data.•Novel K-swaps algorithm to avoid local minima.•Benchmark for evaluating clustering of sets data.•Case study with clustering patients based on their ICD-10 diagnoses.
We present two new clustering algorithms called k-sets and k-swaps for data where each object is a set. First, we define the mean of the sets in a cluster, and the distance between a set and the mean. We then derive the k-sets algorithm from the principles of classical k-means so that it repeats the assignment and update steps until convergence. To the best of our knowledge, the proposed algorithm is the first k-means based algorithm for this kind of data. We adopt the idea also into random swap algorithm, which is a wrapper around the k-means that avoids local minima. This variant is called k-swaps. We show by experiments that this algorithm provides more accurate clustering results than k-medoids and other competitive methods. |
|---|---|
| ISSN: | 0031-3203 1873-5142 1873-5142 |
| DOI: | 10.1016/j.patcog.2023.109454 |