K-sets and k-swaps algorithms for clustering sets

•Novel K-sets algorithm that generalizes k-means to work for sets data.•Novel K-swaps algorithm to avoid local minima.•Benchmark for evaluating clustering of sets data.•Case study with clustering patients based on their ICD-10 diagnoses. We present two new clustering algorithms called k-sets and k-s...

Full description

Saved in:
Bibliographic Details
Published inPattern recognition Vol. 139; p. 109454
Main Authors Rezaei, Mohammad, Fränti, Pasi
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.07.2023
Subjects
Online AccessGet full text
ISSN0031-3203
1873-5142
1873-5142
DOI10.1016/j.patcog.2023.109454

Cover

More Information
Summary:•Novel K-sets algorithm that generalizes k-means to work for sets data.•Novel K-swaps algorithm to avoid local minima.•Benchmark for evaluating clustering of sets data.•Case study with clustering patients based on their ICD-10 diagnoses. We present two new clustering algorithms called k-sets and k-swaps for data where each object is a set. First, we define the mean of the sets in a cluster, and the distance between a set and the mean. We then derive the k-sets algorithm from the principles of classical k-means so that it repeats the assignment and update steps until convergence. To the best of our knowledge, the proposed algorithm is the first k-means based algorithm for this kind of data. We adopt the idea also into random swap algorithm, which is a wrapper around the k-means that avoids local minima. This variant is called k-swaps. We show by experiments that this algorithm provides more accurate clustering results than k-medoids and other competitive methods.
ISSN:0031-3203
1873-5142
1873-5142
DOI:10.1016/j.patcog.2023.109454