Min‐max kurtosis stratum mean: An improved K‐means cluster initialization approach for microarray gene clustering on multidimensional big data

SUMMARY Microarray gene clustering is a big data application that employs the K‐means (KM) clustering algorithm to identify hidden patterns, evolutionary relationships, unknown functions and gene trends for disease diagnosis, tissue detection and biological analysis. The selection of initial centroi...

Full description

Saved in:
Bibliographic Details
Published inConcurrency and computation Vol. 34; no. 23
Main Authors Pandey, Kamlesh Kumar, Shukla, Diwakar
Format Journal Article
LanguageEnglish
Published Hoboken, USA John Wiley & Sons, Inc 25.10.2022
Wiley Subscription Services, Inc
Subjects
Online AccessGet full text
ISSN1532-0626
1532-0634
DOI10.1002/cpe.7185

Cover

More Information
Summary:SUMMARY Microarray gene clustering is a big data application that employs the K‐means (KM) clustering algorithm to identify hidden patterns, evolutionary relationships, unknown functions and gene trends for disease diagnosis, tissue detection and biological analysis. The selection of initial centroids is a major issue in the KM algorithm because it influences the effectiveness, efficiency and local optima of the cluster. The existing initial centroid initialization algorithm is computationally expensive and degrades cluster quality due to the large dimensionality and interconnectedness of microarray gene data. To deal with this issue, this study proposed the min‐max kurtosis stratum mean (MKSM) algorithm for big data clustering in a single machine environment. The MKSM algorithm uses kurtosis for dimension selection, mean distance for gene relationship identification, and stratification for heterogeneous centroid extraction. The results of the presented algorithm are compared to the state‐of‐the‐art initialization strategy on twelve microarray gene datasets utilizing internal, external and statistical assessment criteria. The experimental results demonstrate that the MKSMKM algorithm reduces iterations, distance computation, data comparison and local optima, and improves cluster performance, effectiveness and efficiency with stable convergence.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1532-0626
1532-0634
DOI:10.1002/cpe.7185