Multi-Prototypes Convex Merging Based K-Means Clustering Algorithm
K-Means algorithm is a popular clustering method. However, it has two limitations: 1) it gets stuck easily in spurious local minima, and 2) the number of clusters <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mm...
Saved in:
| Published in | IEEE transactions on knowledge and data engineering Vol. 36; no. 11; pp. 6653 - 6666 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
IEEE
01.11.2024
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1041-4347 1558-2191 |
| DOI | 10.1109/TKDE.2023.3342209 |
Cover
| Summary: | K-Means algorithm is a popular clustering method. However, it has two limitations: 1) it gets stuck easily in spurious local minima, and 2) the number of clusters <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq1-3342209.gif"/> </inline-formula> has to be given a priori. To solve these two issues, a multi-prototypes convex merging based K-Means clustering algorithm (MCKM) is presented. First, based on the structure of the spurious local minima of the K-Means problem, a multi-prototypes sampling (MPS) is designed to select the appropriate number of multi-prototypes for data with arbitrary shapes. Then, a merging technique, called convex merging (CM), merges the multi-prototypes to get a better local minima without <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq2-3342209.gif"/> </inline-formula> being given a priori. Specifically, CM can obtain the optimal merging and estimate the correct <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq3-3342209.gif"/> </inline-formula>. By integrating these two techniques with K-Means algorithm, the proposed MCKM is an efficient and explainable clustering algorithm for escaping the undesirable local minima of K-Means problem without given <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="zhou-ieq4-3342209.gif"/> </inline-formula> first. Two theoretical proofs are given to guarantee that the cost of MCKM (MPS+CM) can achieve a constant factor approximation to the optimal cost of the K-Means problem. Experimental results performed on synthetic and real-world data sets have verified the effectiveness of the proposed algorithm. |
|---|---|
| ISSN: | 1041-4347 1558-2191 |
| DOI: | 10.1109/TKDE.2023.3342209 |