A Hybrid MPI/OpenMP Parallelization of K -Means Algorithms Accelerated Using the Triangle Inequality

The standard formulation of the <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means clustering (Lloyd's method) performs many unnecessary distance calculations. In this paper, we focus on four approaches that use the triangle inequa...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 7; pp. 42280 - 42297
Main Authors Kwedlo, Wojciech, Czochanski, Pawel J.
Format Journal Article
LanguageEnglish
Published IEEE 2019
Subjects
Online AccessGet full text
ISSN2169-3536
2169-3536
DOI10.1109/ACCESS.2019.2907885

Cover

More Information
Summary:The standard formulation of the <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means clustering (Lloyd's method) performs many unnecessary distance calculations. In this paper, we focus on four approaches that use the triangle inequality to avoid unnecessary distance calculations. These approaches are Drake's, Elkan's, Annulus, and Yinyang algorithms. We propose a hybrid MPI/OpenMP parallelization of these algorithms in which the dataset and the corresponding data structures storing bounds on distances are evenly divided among MPI processes. Then, in the assignment step of a <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means iteration, each MPI process computes the assignment of its portion of data using OpenMP threads. In the update step of the iteration, the cluster centroids are computed using a hierarchical all-reduce operation. In the computational experiments, we compared the strong scalability of these four algorithms with the scalability of Lloyd's algorithm, parallelized using the same approach. The results indicate that all four algorithms maintain an advantage in computing time over Lloyd's algorithm. A comparison with two software packages, whose sources are publicly available, in the same computing environment, shows that our implementations are more efficient.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2019.2907885