Intuitive-K-prototypes: A mixed data clustering algorithm with intuitionistic distribution centroid

Data sets are usually mixed with numerical and categorical attributes in the real world. Data mining of mixed data makes a lot of sense. This paper proposes an Intuitive-K-prototypes clustering algorithm with improved prototype representation and attribute weights. The proposed algorithm defines int...

Full description

Saved in:
Bibliographic Details
Published inPattern recognition Vol. 158; p. 111062
Main Authors Wang, Hongli, Mi, Jusheng
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.02.2025
Subjects
Online AccessGet full text
ISSN0031-3203
DOI10.1016/j.patcog.2024.111062

Cover

More Information
Summary:Data sets are usually mixed with numerical and categorical attributes in the real world. Data mining of mixed data makes a lot of sense. This paper proposes an Intuitive-K-prototypes clustering algorithm with improved prototype representation and attribute weights. The proposed algorithm defines intuitionistic distribution centroid for categorical attributes. In our approach, a heuristic search for initial prototypes is performed. Then, we combine the mean of numerical attributes and intuitionistic distribution centroid to represent the cluster prototype. In addition, intra-cluster complexity and inter-cluster similarity are used to adjust attribute weights, with higher priority given to those with lower complexity and similarity. The membership and non-membership distance are calculated using the intuitionistic distribution centroid. These distances are then combined parametrically to obtain the composite distance. The algorithm is judged for its clustering effectiveness on the real UCI data set, and the results show that the proposed algorithm outperforms the traditional clustering algorithm in most cases. •Propose a method for initial prototypes based on the approximate farthest distance.•Propose the concept of the intuitionistic distribution centroid.•Construct attribute weights by the similarity of inter-cluster attributes.
ISSN:0031-3203
DOI:10.1016/j.patcog.2024.111062