Min‐max kurtosis stratum mean: An improved K‐means cluster initialization approach for microarray gene clustering on multidimensional big data

SUMMARY Microarray gene clustering is a big data application that employs the K‐means (KM) clustering algorithm to identify hidden patterns, evolutionary relationships, unknown functions and gene trends for disease diagnosis, tissue detection and biological analysis. The selection of initial centroi...

Full description

Saved in:

Bibliographic Details
Published in	Concurrency and computation Vol. 34; no. 23
Main Authors	Pandey, Kamlesh Kumar, Shukla, Diwakar
Format	Journal Article
Language	English
Published	Hoboken, USA John Wiley & Sons, Inc 25.10.2022 Wiley Subscription Services, Inc
Subjects	Algorithms Big Data big data clustering Centroids Clustering Effectiveness gene clustering initial centroid Kurtosis K‐means Mean microarray clustering multidimensional clustering
Online Access	Get full text
ISSN	1532-0626 1532-0634
DOI	10.1002/cpe.7185

Cover

More Information
Summary:	SUMMARY Microarray gene clustering is a big data application that employs the K‐means (KM) clustering algorithm to identify hidden patterns, evolutionary relationships, unknown functions and gene trends for disease diagnosis, tissue detection and biological analysis. The selection of initial centroids is a major issue in the KM algorithm because it influences the effectiveness, efficiency and local optima of the cluster. The existing initial centroid initialization algorithm is computationally expensive and degrades cluster quality due to the large dimensionality and interconnectedness of microarray gene data. To deal with this issue, this study proposed the min‐max kurtosis stratum mean (MKSM) algorithm for big data clustering in a single machine environment. The MKSM algorithm uses kurtosis for dimension selection, mean distance for gene relationship identification, and stratification for heterogeneous centroid extraction. The results of the presented algorithm are compared to the state‐of‐the‐art initialization strategy on twelve microarray gene datasets utilizing internal, external and statistical assessment criteria. The experimental results demonstrate that the MKSMKM algorithm reduces iterations, distance computation, data comparison and local optima, and improves cluster performance, effectiveness and efficiency with stable convergence.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1532-0626 1532-0634
DOI:	10.1002/cpe.7185