Genetic intuitionistic weighted fuzzy k-modes algorithm for categorical data

•Employ the intuitionistic fuzzy set theory in fuzzy clustering for categorical attributes.•Use the new similarity measure for categorical data, which is based on the frequency probability-based distance metric, to calculate the dissimilarity measure.•Consider the importance of each categorical attr...

Full description

Saved in:
Bibliographic Details
Published inNeurocomputing (Amsterdam) Vol. 330; pp. 116 - 126
Main Authors Kuo, R.J., Nguyen, Thi Phuong Quyen
Format Journal Article
LanguageEnglish
Published Elsevier B.V 22.02.2019
Subjects
Online AccessGet full text
ISSN0925-2312
1872-8286
DOI10.1016/j.neucom.2018.11.016

Cover

More Information
Summary:•Employ the intuitionistic fuzzy set theory in fuzzy clustering for categorical attributes.•Use the new similarity measure for categorical data, which is based on the frequency probability-based distance metric, to calculate the dissimilarity measure.•Consider the importance of each categorical attribute differently by updating the weight for each categorical attribute in the clustering process iteratively.•Exploit the global optimal solution by genetic algorithm (GA).•Provide the unsupervised feature selection process to remove the redundant features of the original dataset prior to performing GA process. Data clustering with categorical attributes has been widely used in many real-world applications. Most of the existing clustering algorithms proposed for the categorical data face two major drawbacks of termination at a local optimal solution and considering all attributes equally. Thus, this study proposes a novel clustering method, named genetic intuitionistic weighted fuzzy k-modes (GIWFKM) algorithm, based on the conventional fuzzy k-modes and genetic algorithm (GA). The proposed algorithm firstly introduces the intuitionistic weighted fuzzy k-modes (IWFKM) algorithm which employs the intuitionistic fuzzy set in the clustering process and the new similarity measure for categorical data based on frequency probability-based distance metric. Then, the GIWFKM algorithm, which integrates the IWFKM algorithm and GA, is proposed to employ the global optimal solution. Moreover, the GIWFKM algorithm performs the unsupervised feature selection based on the correlation coefficient to remove some redundant features which can both improve the clustering performance and reduce the computational time. To evaluate the clustering result, a series of experiments in different categorical datasets are conducted to compare the performance of the proposed algorithms with that of other benchmark algorithms including fuzzy k-modes, weighted fuzzy k-modes, genetic fuzzy k-modes, space structure-based clustering, and many-objective fuzzy centroids clustering algorithms. The experimental results conducted on the datasets collected from UCI machine learning repository exhibit that the GIWFKM algorithm outperforms the other benchmark algorithms in terms of Adjusted Rank Index (ARI) and clustering accuracy (CA).
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2018.11.016