Fuzzy Centroid and Genetic Algorithms: Solutions for Numeric and Categorical Mixed Data Clustering

Statistical data analysis in machine learning and data mining usually uses the clustering technique. However, data with both attributes or mixed data exists universally in real life. K-prototype is a well-known algorithm for clustering mixed data because of its effectiveness in handling large data....

Full description

Saved in:
Bibliographic Details
Published inProcedia computer science Vol. 179; pp. 677 - 684
Main Authors Nooraeni, Rani, Arsa, Muhamad Iqbal, Kusumo Projo, Nucke Widowati
Format Journal Article
LanguageEnglish
Published Elsevier B.V 2021
Subjects
Online AccessGet full text
ISSN1877-0509
1877-0509
DOI10.1016/j.procs.2021.01.055

Cover

More Information
Summary:Statistical data analysis in machine learning and data mining usually uses the clustering technique. However, data with both attributes or mixed data exists universally in real life. K-prototype is a well-known algorithm for clustering mixed data because of its effectiveness in handling large data. However, practically, k-prototype has two main weaknesses, the use of mode as a cluster center for categorical attributes cannot accurately represent the objects, and the algorithm may stop at the local optimum solution because affected by random initial cluster prototypes. To overcome the first weakness, we can use fuzzy centroid, and for second weakness is to implement the genetic algorithm to search the global optimum solution. Our research combines the genetic algorithm and Fuzzy K-Prototype to accommodate these two weaknesses. We set up two multivariate data with high correlation and low correlation to see the robustness of the proposed algorithm. According to four value indexes of clustering result evaluation, Coefficient Varians Index, Partition Coefficient, Partition Entropy, and Purity, show that our proposed algorithm has a better result than K prototype. Based on the evaluation result, we conclude that our proposed algorithm can solve two weaknesses of the k-prototype algorithm.
ISSN:1877-0509
1877-0509
DOI:10.1016/j.procs.2021.01.055