Bandwidth optimal all-reduce algorithms for clusters of workstations

We consider an efficient realization of the all-reduce operation with large data sizes in cluster environments, under the assumption that the reduce operator is associative and commutative. We derive a tight lower bound of the amount of data that must be communicated in order to complete this operat...

Full description

Saved in:
Bibliographic Details
Published inJournal of parallel and distributed computing Vol. 69; no. 2; pp. 117 - 124
Main Authors Patarasuk, Pitch, Yuan, Xin
Format Journal Article
LanguageEnglish
Published Amsterdam Elsevier Inc 01.02.2009
Elsevier
Subjects
Online AccessGet full text
ISSN0743-7315
1096-0848
DOI10.1016/j.jpdc.2008.09.002

Cover

More Information
Summary:We consider an efficient realization of the all-reduce operation with large data sizes in cluster environments, under the assumption that the reduce operator is associative and commutative. We derive a tight lower bound of the amount of data that must be communicated in order to complete this operation and propose a ring-based algorithm that only requires tree connectivity to achieve bandwidth optimality. Unlike the widely used butterfly-like all-reduce algorithm that incurs network contention in SMP/multi-core clusters, the proposed algorithm can achieve contention-free communication in almost all contemporary clusters, including SMP/multi-core clusters and Ethernet switched clusters with multiple switches. We demonstrate that the proposed algorithm is more efficient than other algorithms on clusters with different nodal architectures and networking technologies when the data size is sufficiently large.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:0743-7315
1096-0848
DOI:10.1016/j.jpdc.2008.09.002