Clustering Algorithm by Boundary Detection Base on Entropy of KNN

Clustering analysis has been widely applied in various fields, and boundary detection based clustering algorithms have shown effective performance. In this work, we propose a clustering algorithm by boundary detection based on entropy of KNN (CBDEK). A border point contains only the nearest neighbor...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 13; pp. 62353 - 62366
Main Authors Ding, Jiaman, Yin, Jinyuan, Jia, Lianyin, Fu, Xiaodong, Wang, Hongbin
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN2169-3536
2169-3536
DOI10.1109/ACCESS.2025.3555772

Cover

More Information
Summary:Clustering analysis has been widely applied in various fields, and boundary detection based clustering algorithms have shown effective performance. In this work, we propose a clustering algorithm by boundary detection based on entropy of KNN (CBDEK). A border point contains only the nearest neighbors within a specific directional range. Thus, we define entropy of KNN (EK) to accurately identify the boundary of the cluster. Since entropy has the property of measuring uncertainty, it can be used to quantify the possibility that a point is a border point. A lower EK indicates an uneven neighbor distribution, increasing the possibility of the point being a border point. Then, the border points are clustered based on the directional similarity of their nearest neighbors. Specifically, if a border point is a neighbor of another border point and most of their nearest neighbors show directional similarity, they are considered to belong to the same cluster. Furthermore, we assign the label of a border point to the interior points located within the maximum nearest neighbor sub-block of this border point to facilitate an efficient allocation for the remaining points (interior points). In addition, our algorithm incorporates noise mitigation techniques using average distance and box plot analysis. The effectiveness of CBDEK is proven by a comparative evaluation of nine algorithms on 24 datasets.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2025.3555772