Fuzzy Centroid and Genetic Algorithms: Solutions for Numeric and Categorical Mixed Data Clustering

Statistical data analysis in machine learning and data mining usually uses the clustering technique. However, data with both attributes or mixed data exists universally in real life. K-prototype is a well-known algorithm for clustering mixed data because of its effectiveness in handling large data....

Full description

Saved in:

Bibliographic Details
Published in	Procedia computer science Vol. 179; pp. 677 - 684
Main Authors	Nooraeni, Rani, Arsa, Muhamad Iqbal, Kusumo Projo, Nucke Widowati
Format	Journal Article
Language	English
Published	Elsevier B.V 2021
Subjects	Clustering Data Mining Fuzzy K Prototype Genetic Algorithm Mixed Data Data Mining Clustering Fuzzy K Prototype Mixed Data Genetic Algorithm
Online Access	Get full text
ISSN	1877-0509 1877-0509
DOI	10.1016/j.procs.2021.01.055

Cover

More Information
Summary:	Statistical data analysis in machine learning and data mining usually uses the clustering technique. However, data with both attributes or mixed data exists universally in real life. K-prototype is a well-known algorithm for clustering mixed data because of its effectiveness in handling large data. However, practically, k-prototype has two main weaknesses, the use of mode as a cluster center for categorical attributes cannot accurately represent the objects, and the algorithm may stop at the local optimum solution because affected by random initial cluster prototypes. To overcome the first weakness, we can use fuzzy centroid, and for second weakness is to implement the genetic algorithm to search the global optimum solution. Our research combines the genetic algorithm and Fuzzy K-Prototype to accommodate these two weaknesses. We set up two multivariate data with high correlation and low correlation to see the robustness of the proposed algorithm. According to four value indexes of clustering result evaluation, Coefficient Varians Index, Partition Coefficient, Partition Entropy, and Purity, show that our proposed algorithm has a better result than K prototype. Based on the evaluation result, we conclude that our proposed algorithm can solve two weaknesses of the k-prototype algorithm.
ISSN:	1877-0509 1877-0509
DOI:	10.1016/j.procs.2021.01.055