Centroid Based Celestial Clustering Algorithm: A Novel Unsupervised Learning Method for Haemogram Data Clustering
Accuracy of clustering is the most important parameter as far as automated disease identification is concerned. There have always been attempts to automate the process of disease prediction from haemogram data. However, there are several components in blood test results and very often we find that a...
Saved in:
| Published in | IEEE transactions on emerging topics in computational intelligence Vol. 7; no. 3; pp. 942 - 956 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
Piscataway
IEEE
01.06.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 2471-285X 2471-285X |
| DOI | 10.1109/TETCI.2022.3211004 |
Cover
| Summary: | Accuracy of clustering is the most important parameter as far as automated disease identification is concerned. There have always been attempts to automate the process of disease prediction from haemogram data. However, there are several components in blood test results and very often we find that a variety of combinations of these component results are to be used to detect a disease. This makes identification of diseases really hard and necessitates the use of data analysis techniques. As new diseases are arising from time to time, a useful method for prediction is unsupervised learning and the corresponding data analysis technique is clustering. An easy, efficient and centroid based clustering algorithm that has been in practice widely is k-means. Its simplicity and efficiency make it a natural choice for most of the clustering applications. However, k-means is largely dependent on the selection of initial cluster centers and a bad choice can make it fall to local optima, thereby sacrificing accuracy. Besides, it is non-deterministic in nature. This paper proposes a novel, nature inspired, clustering method, named Centroid Based Celestial Clustering, which overcomes the above issues. Our method is deterministic and converges to global optima on spherical datasets. We experimentally evaluate our algorithm for speed of execution and cluster quality against well-known clustering algorithms using statistical evaluation metrics like silhouette width, adjusted rand index and Dunn index. We use the method to predict diseases identifiable from blood tests and our experiments show that the accuracy of prediction is very promising. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2471-285X 2471-285X |
| DOI: | 10.1109/TETCI.2022.3211004 |