K-Harmonic means type clustering algorithm for mixed datasets
[Display omitted] •A K-Harmonic clustering algorithm for mixed data has been presented to reduce random initialization problem for partitional algorithms.•The proposed clustering algorithm uses a distance measure developed for mixed datasets.•The experiment results suggest that clustering results ar...
        Saved in:
      
    
          | Published in | Applied soft computing Vol. 48; pp. 39 - 49 | 
|---|---|
| Main Authors | , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
            Elsevier B.V
    
        01.11.2016
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1568-4946 1872-9681  | 
| DOI | 10.1016/j.asoc.2016.06.019 | 
Cover
| Summary: | [Display omitted]
•A K-Harmonic clustering algorithm for mixed data has been presented to reduce random initialization problem for partitional algorithms.•The proposed clustering algorithm uses a distance measure developed for mixed datasets.•The experiment results suggest that clustering results are quite insensitive to random initialization.•The proposed algorithm performed better than other clustering algorithms for various datasets.
K-means type clustering algorithms for mixed data that consists of numeric and categorical attributes suffer from cluster center initialization problem. The final clustering results depend upon the initial cluster centers. Random cluster center initialization is a popular initialization technique. However, clustering results are not consistent with different cluster center initializations. K-Harmonic means clustering algorithm tries to overcome this problem for pure numeric data. In this paper, we extend the K-Harmonic means clustering algorithm for mixed datasets. We propose a definition for a cluster center and a distance measure. These cluster centers and the distance measure are used with the cost function of K-Harmonic means clustering algorithm in the proposed algorithm. Experiments were carried out with pure categorical datasets and mixed datasets. Results suggest that the proposed clustering algorithm is quite insensitive to the cluster center initialization problem. Comparative studies with other clustering algorithms show that the proposed algorithm produce better clustering results. | 
|---|---|
| ISSN: | 1568-4946 1872-9681  | 
| DOI: | 10.1016/j.asoc.2016.06.019 |