Multi-Prototypes Convex Merging Based K-Means Clustering Algorithm

K-Means algorithm is a popular clustering method. However, it has two limitations: 1) it gets stuck easily in spurious local minima, and 2) the number of clusters <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mm...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on knowledge and data engineering Vol. 36; no. 11; pp. 6653 - 6666
Main Authors	Li, Dong, Zhou, Shuisheng, Zeng, Tieyong, Chan, Raymond H.
Format	Journal Article
Language	English
Published	IEEE 01.11.2024
Subjects	Approximation algorithms Clustering algorithms Convex merging Costs K-means Merging multi-prototypes multi-prototypes sampling Partitioning algorithms Prototypes Shape
Online Access	Get full text
ISSN	1041-4347 1558-2191
DOI	10.1109/TKDE.2023.3342209

Cover

More Information
Summary:	K-Means algorithm is a popular clustering method. However, it has two limitations: 1) it gets stuck easily in spurious local minima, and 2) the number of clusters <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq1-3342209.gif"/> </inline-formula> has to be given a priori. To solve these two issues, a multi-prototypes convex merging based K-Means clustering algorithm (MCKM) is presented. First, based on the structure of the spurious local minima of the K-Means problem, a multi-prototypes sampling (MPS) is designed to select the appropriate number of multi-prototypes for data with arbitrary shapes. Then, a merging technique, called convex merging (CM), merges the multi-prototypes to get a better local minima without <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq2-3342209.gif"/> </inline-formula> being given a priori. Specifically, CM can obtain the optimal merging and estimate the correct <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq3-3342209.gif"/> </inline-formula>. By integrating these two techniques with K-Means algorithm, the proposed MCKM is an efficient and explainable clustering algorithm for escaping the undesirable local minima of K-Means problem without given <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq4-3342209.gif"/> </inline-formula> first. Two theoretical proofs are given to guarantee that the cost of MCKM (MPS+CM) can achieve a constant factor approximation to the optimal cost of the K-Means problem. Experimental results performed on synthetic and real-world data sets have verified the effectiveness of the proposed algorithm.
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2023.3342209