Multi-Prototypes Convex Merging Based K-Means Clustering Algorithm

K-Means algorithm is a popular clustering method. However, it has two limitations: 1) it gets stuck easily in spurious local minima, and 2) the number of clusters <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mm...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on knowledge and data engineering Vol. 36; no. 11; pp. 6653 - 6666
Main Authors Li, Dong, Zhou, Shuisheng, Zeng, Tieyong, Chan, Raymond H.
Format Journal Article
LanguageEnglish
Published IEEE 01.11.2024
Subjects
Online AccessGet full text
ISSN1041-4347
1558-2191
DOI10.1109/TKDE.2023.3342209

Cover

More Information
Summary:K-Means algorithm is a popular clustering method. However, it has two limitations: 1) it gets stuck easily in spurious local minima, and 2) the number of clusters <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq1-3342209.gif"/> </inline-formula> has to be given a priori. To solve these two issues, a multi-prototypes convex merging based K-Means clustering algorithm (MCKM) is presented. First, based on the structure of the spurious local minima of the K-Means problem, a multi-prototypes sampling (MPS) is designed to select the appropriate number of multi-prototypes for data with arbitrary shapes. Then, a merging technique, called convex merging (CM), merges the multi-prototypes to get a better local minima without <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq2-3342209.gif"/> </inline-formula> being given a priori. Specifically, CM can obtain the optimal merging and estimate the correct <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq3-3342209.gif"/> </inline-formula>. By integrating these two techniques with K-Means algorithm, the proposed MCKM is an efficient and explainable clustering algorithm for escaping the undesirable local minima of K-Means problem without given <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq4-3342209.gif"/> </inline-formula> first. Two theoretical proofs are given to guarantee that the cost of MCKM (MPS+CM) can achieve a constant factor approximation to the optimal cost of the K-Means problem. Experimental results performed on synthetic and real-world data sets have verified the effectiveness of the proposed algorithm.
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2023.3342209