Hierarchical redesign of classic MPI reduction algorithms

Optimization of MPI collective communication operations has been an active research topic since the advent of MPI in 1990s. Many general and architecture-specific collective algorithms have been proposed and implemented in the state-of-the-art MPI implementations. Hierarchical topology-oblivious tra...

Full description

Saved in:
Bibliographic Details
Published inThe Journal of supercomputing Vol. 73; no. 2; pp. 713 - 725
Main Authors Hasanov, Khalid, Lastovetsky, Alexey
Format Journal Article
LanguageEnglish
Published New York Springer US 01.02.2017
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN0920-8542
1573-0484
DOI10.1007/s11227-016-1779-7

Cover

More Information
Summary:Optimization of MPI collective communication operations has been an active research topic since the advent of MPI in 1990s. Many general and architecture-specific collective algorithms have been proposed and implemented in the state-of-the-art MPI implementations. Hierarchical topology-oblivious transformation of existing communication algorithms has been recently proposed as a new promising approach to optimization of MPI collective communication algorithms and MPI-based applications. This approach has been successfully applied to the most popular parallel matrix multiplication algorithm, SUMMA, and the state-of-the-art MPI broadcast algorithms, demonstrating significant multifold performance gains, especially for large-scale HPC systems. In this paper, we apply this approach to optimization of the MPI Reduce and Allreduce operations. Theoretical analysis and experimental results on a cluster of Grid’5000 platform are presented.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-016-1779-7