Partition-and-merge based fuzzy genetic clustering algorithm for categorical data
Categorical data clustering is a difficult and challenging task due to the special characteristic of categorical attributes: no natural order. Thus, this study aims to propose a two-stage method named partition-and-merge based fuzzy genetic clustering algorithm (PM-FGCA) for categorical data. The pr...
Saved in:
| Published in | Applied soft computing Vol. 75; pp. 254 - 264 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
Elsevier B.V
01.02.2019
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1568-4946 1872-9681 |
| DOI | 10.1016/j.asoc.2018.11.028 |
Cover
| Summary: | Categorical data clustering is a difficult and challenging task due to the special characteristic of categorical attributes: no natural order. Thus, this study aims to propose a two-stage method named partition-and-merge based fuzzy genetic clustering algorithm (PM-FGCA) for categorical data. The proposed PM-FGCA uses a fuzzy genetic clustering algorithm to partition the dataset into a maximum number of clusters in the first stage. Then, the merge stage is designed to select two clusters among the clusters that generated in the first stage based on its inter-cluster distances and merge two selected clusters to one cluster. This procedure is repeated until the number of clusters equals to the predetermined number of clusters. Thereafter, some particular instances in each cluster are considered to be re-assigned to other clusters based on the intra-cluster distances. The proposed PM-FGCA is implemented on ten categorical datasets from UCI machine learning repository. In order to evaluate the clustering performance, the proposed PM-FGCA is compared with some existing methods such as k-modes algorithm, fuzzy k-modes algorithm, genetic fuzzy k-modes algorithm, and non-dominated sorting genetic algorithm using fuzzy membership chromosomes. Adjusted Ranked Index (ARI), Normalized Mutual Information (NMI), and Davies–Bouldin (DB) index are selected as three clustering validation indices which are represented to both external index (i.e., ARI and NMI) and internal index (i.e., DB). Consequently, the experimental result shows that the proposed PM-FGCA outperforms the benchmark methods in terms of the tested indices.
[Display omitted]
•Use fuzzy genetic clustering algorithm to partition the dataset into a maximum number of clusters in the partitioning stage.•Provide a more compact clustering result by a repeated merging process which merges two clusters from partitioning stage based on the inter-cluster distances.•Improve the clustering result by re-assigning some specific instances from its own clusters to others cluster based on intra-cluster distances. |
|---|---|
| ISSN: | 1568-4946 1872-9681 |
| DOI: | 10.1016/j.asoc.2018.11.028 |