A novel density peaks clustering algorithm based on Hopkins statistic

Density peaks clustering (DPC) is a promising algorithm due to straightforward and easy implementation. However, most of its improvements still rely on expert, strong prior information, or complex iterations to identify the cluster centers, which inevitably adds subjectivity and instability. Moreove...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 201; p. 116892
Main Authors Zhang, Ruilin, Miao, Zhenguo, Tian, Ye, Wang, Hongpeng
Format Journal Article
LanguageEnglish
Published New York Elsevier Ltd 01.09.2022
Elsevier BV
Subjects
Online AccessGet full text
ISSN0957-4174
1873-6793
DOI10.1016/j.eswa.2022.116892

Cover

More Information
Summary:Density peaks clustering (DPC) is a promising algorithm due to straightforward and easy implementation. However, most of its improvements still rely on expert, strong prior information, or complex iterations to identify the cluster centers, which inevitably adds subjectivity and instability. Moreover, some crisp and sensitive density metrics will sometimes reduce the representativeness of the center, resulting in poor clustering. To this end, we propose an enhanced algorithm, called Density peaks clustering based on Hopkins Statistic. The main property of the method is to realize the automatic identification of cluster centers without prior information. Specifically, with a two-stage strategy, we first specify some objects as candidate centers by linear regression and residual analysis. Subsequently, inspired by optimization idea we design a novel validity index (AHS) instead of the original decision graph to find the desired centers from the candidates. Another novel part of DPC-AHS is that the proposed adjusted-k-nearest neighbors (A-kNN) dynamically defines the neighbors during the process, which further enhances the robustness against outliers. Finally, we compare performance of DPC-AHS with 7 state-of-the-art methods over synthetic, UCI, and image datasets. Experiments on 25 datasets and in-depth discussion cases from 5 perspectives demonstrate that our algorithm is feasible and effective in clustering and center identification. •A novel density peaks clustering based on Hopkins Statistic (DPC-AHS) is proposed.•DPC-AHS can automatically find clusters and centers without manual participation.•A cluster validity index AHS with low complexity is designed to evaluate clustering.•Experiments and discussions on various datasets show the effectiveness of our method.•DPC-AHS requires only one parameter and can be applied to high dimensional data.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2022.116892