Bandwidth optimal all-reduce algorithms for clusters of workstations

We consider an efficient realization of the all-reduce operation with large data sizes in cluster environments, under the assumption that the reduce operator is associative and commutative. We derive a tight lower bound of the amount of data that must be communicated in order to complete this operat...

Full description

Saved in:
Bibliographic Details
Published inJournal of parallel and distributed computing Vol. 69; no. 2; pp. 117 - 124
Main Authors Patarasuk, Pitch, Yuan, Xin
Format Journal Article
LanguageEnglish
Published Amsterdam Elsevier Inc 01.02.2009
Elsevier
Subjects
Online AccessGet full text
ISSN0743-7315
1096-0848
DOI10.1016/j.jpdc.2008.09.002

Cover

Abstract We consider an efficient realization of the all-reduce operation with large data sizes in cluster environments, under the assumption that the reduce operator is associative and commutative. We derive a tight lower bound of the amount of data that must be communicated in order to complete this operation and propose a ring-based algorithm that only requires tree connectivity to achieve bandwidth optimality. Unlike the widely used butterfly-like all-reduce algorithm that incurs network contention in SMP/multi-core clusters, the proposed algorithm can achieve contention-free communication in almost all contemporary clusters, including SMP/multi-core clusters and Ethernet switched clusters with multiple switches. We demonstrate that the proposed algorithm is more efficient than other algorithms on clusters with different nodal architectures and networking technologies when the data size is sufficiently large.
AbstractList We consider an efficient realization of the all-reduce operation with large data sizes in cluster environments, under the assumption that the reduce operator is associative and commutative. We derive a tight lower bound of the amount of data that must be communicated in order to complete this operation and propose a ring-based algorithm that only requires tree connectivity to achieve bandwidth optimality. Unlike the widely used butterfly-like all-reduce algorithm that incurs network contention in SMP/multi-core clusters, the proposed algorithm can achieve contention-free communication in almost all contemporary clusters, including SMP /multi-core clusters and Ethernet switched clusters with multiple switches. We demonstrate that the proposed algorithm is more efficient than other algorithms on clusters with different nodal architectures and networking technologies when the data size is sufficiently large.
Author Yuan, Xin
Patarasuk, Pitch
Author_xml – sequence: 1
  givenname: Pitch
  surname: Patarasuk
  fullname: Patarasuk, Pitch
  email: patarasu@cs.fsu.edu
– sequence: 2
  givenname: Xin
  surname: Yuan
  fullname: Yuan, Xin
  email: xyuan@cs.fsu.edu
BackLink http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=21067440$$DView record in Pascal Francis
BookMark eNp9kD1v2zAQQInCAeq4-QOdtKSblCNFkTKQJU3btICBLslMnPlR06FFh6Rj9N9Xjo0MHTzxAL53wL1LMhniYAn5TKGhQMXNullvjW4YQN_AvAFgH8iUwlzU0PN-QqYgeVvLlnYfyWXOawBKO9lPybevOJi9N2VVxW3xGwwVhlAna3bajuOfmHxZbXLlYqp02OViU66iq_YxPeeCxcchfyIXDkO2V6d3Rp5-fH-8_1kvfj_8ur9b1JozWWreMSbR9kuk0iATnHPkQug5R9lZ6bhoqXFGLMc_prHlculw2UrmJJVC9O2MfDnu3ab4srO5qI3P2oaAg427rFredSA4jOD1CcSsMbiEg_ZZbdN4YPqrGAUh-RvHjpxOMedk3TtCQR3CqrU6hFWHsArmagw7Sv1_kvbHECWhD-fV26Nqx0qv3iaVtbeDtsYnq4sy0Z_T_wFuw5YI
CitedBy_id crossref_primary_10_1016_j_jpdc_2023_104811
crossref_primary_10_1145_3424660
crossref_primary_10_1007_s11227_016_1707_x
crossref_primary_10_1109_ACCESS_2024_3384460
crossref_primary_10_1109_MM_2021_3139092
crossref_primary_10_1016_j_future_2020_01_004
crossref_primary_10_1109_TITS_2023_3286400
crossref_primary_10_1109_TPDS_2024_3406420
crossref_primary_10_14778_3598581_3598604
crossref_primary_10_3390_electronics12143021
crossref_primary_10_1109_ACCESS_2020_2999096
crossref_primary_10_1109_JIOT_2023_3250275
crossref_primary_10_1145_3708495
crossref_primary_10_1016_j_jpdc_2020_11_005
crossref_primary_10_1145_3649467
crossref_primary_10_1145_3320060
crossref_primary_10_1007_s10462_024_11036_2
crossref_primary_10_1109_TKDE_2020_3015777
crossref_primary_10_1109_TNET_2020_2999377
crossref_primary_10_1016_j_jpdc_2023_104719
crossref_primary_10_1109_TPDS_2023_3247883
crossref_primary_10_1109_TPDS_2023_3247001
crossref_primary_10_1016_j_sigpro_2021_108245
crossref_primary_10_1021_acs_jpca_1c04587
crossref_primary_10_1016_j_comnet_2021_107846
crossref_primary_10_1007_s42514_023_00150_2
crossref_primary_10_1007_s10586_022_03798_7
crossref_primary_10_1109_ACCESS_2025_3528248
crossref_primary_10_1016_j_neucom_2023_126661
crossref_primary_10_3390_app14125100
crossref_primary_10_1109_ACCESS_2024_3498699
crossref_primary_10_1002_spe_2878
crossref_primary_10_1109_TPDS_2021_3132413
crossref_primary_10_1016_j_sysarc_2024_103180
crossref_primary_10_1109_TNSE_2024_3523320
crossref_primary_10_1002_cpe_5574
crossref_primary_10_1109_MM_2020_3039835
crossref_primary_10_1016_j_comnet_2023_109777
crossref_primary_10_1109_ACCESS_2019_2956775
crossref_primary_10_1007_s41019_022_00202_7
crossref_primary_10_1029_2020MS002064
crossref_primary_10_1016_j_comnet_2022_109191
crossref_primary_10_1016_j_isatra_2023_03_034
crossref_primary_10_1016_j_parco_2021_102871
crossref_primary_10_1109_TNNLS_2021_3084806
crossref_primary_10_1002_cpe_7879
crossref_primary_10_1109_TNET_2023_3244794
crossref_primary_10_2139_ssrn_4072553
crossref_primary_10_1109_TSP_2022_3184770
crossref_primary_10_1109_TCAD_2023_3290128
crossref_primary_10_1109_TPDS_2022_3151739
crossref_primary_10_1007_s42514_019_00018_4
crossref_primary_10_1016_j_hcc_2024_100235
crossref_primary_10_1587_transinf_2020EDP7201
crossref_primary_10_1109_MM_2021_3091475
crossref_primary_10_1109_TNET_2022_3202529
crossref_primary_10_1109_TC_2023_3315847
crossref_primary_10_1016_j_parco_2023_103053
crossref_primary_10_1109_MCAS_2024_3349669
crossref_primary_10_1016_j_neucom_2024_127258
crossref_primary_10_1109_TPDS_2025_3539738
crossref_primary_10_1145_3469379_3469387
crossref_primary_10_1109_COMST_2020_3007787
crossref_primary_10_3390_electronics9030440
crossref_primary_10_1109_MSP_2020_2969859
crossref_primary_10_1109_TCAD_2024_3368970
crossref_primary_10_1016_j_future_2023_10_010
crossref_primary_10_1109_TNET_2024_3404999
crossref_primary_10_1109_TPDS_2023_3331372
crossref_primary_10_1109_TNSE_2024_3419030
crossref_primary_10_1049_trit_2020_0082
crossref_primary_10_1109_LCA_2022_3189207
crossref_primary_10_1007_s42514_020_00046_5
crossref_primary_10_1007_s11390_024_3872_3
crossref_primary_10_1145_3589310
crossref_primary_10_1155_2023_5510329
crossref_primary_10_1016_j_parco_2021_102812
crossref_primary_10_1016_j_osn_2023_100761
crossref_primary_10_1007_s42484_023_00114_3
crossref_primary_10_2139_ssrn_4660550
crossref_primary_10_1038_s41598_023_44541_5
crossref_primary_10_1109_TNSM_2024_3461875
crossref_primary_10_1109_JSAC_2023_3242733
crossref_primary_10_1360_SSI_2023_0051
crossref_primary_10_1145_3377454
crossref_primary_10_1109_JETCAS_2019_2912353
crossref_primary_10_1109_TPDS_2021_3064966
crossref_primary_10_1109_TPDS_2024_3476390
crossref_primary_10_1109_TPDS_2022_3201531
crossref_primary_10_1007_s10462_022_10141_4
crossref_primary_10_1016_j_jpdc_2023_104767
crossref_primary_10_1007_s13042_023_01903_9
crossref_primary_10_1016_j_knosys_2020_106002
crossref_primary_10_1007_s11042_022_12292_6
crossref_primary_10_1109_TPDS_2024_3397800
crossref_primary_10_1016_j_sysarc_2022_102640
crossref_primary_10_3934_mbe_2022044
crossref_primary_10_1007_s12083_014_0253_7
crossref_primary_10_1109_JLT_2021_3073277
crossref_primary_10_1109_TPDS_2020_3040606
crossref_primary_10_1145_3570607
crossref_primary_10_1364_JOCN_499303
crossref_primary_10_1109_TNET_2021_3109097
Cites_doi 10.1016/0167-8191(96)00024-5
10.1109/71.406965
10.1007/s10766-008-0070-9
10.1007/s10766-007-0047-0
10.1016/j.jpdc.2005.04.014
10.1109/71.491579
10.1177/1094342005051521
10.1109/TPDS.2003.1178875
10.1016/j.jpdc.2007.11.003
10.1142/S012962649300037X
10.1016/0166-218X(94)E0148-6
10.1016/0012-365X(75)90090-4
10.1007/3-540-48158-3_2
10.1109/TPDS.2007.19
10.1145/1088149.1088183
10.1109/71.615442
10.1016/j.peva.2006.05.013
10.1109/ISPDC.2004.24
10.1006/jpdc.1994.1091
ContentType Journal Article
Copyright 2008 Elsevier Inc.
2009 INIST-CNRS
Copyright_xml – notice: 2008 Elsevier Inc.
– notice: 2009 INIST-CNRS
DBID AAYXX
CITATION
IQODW
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1016/j.jpdc.2008.09.002
DatabaseName CrossRef
Pascal-Francis
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList Computer and Information Systems Abstracts

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
Applied Sciences
EISSN 1096-0848
EndPage 124
ExternalDocumentID 21067440
10_1016_j_jpdc_2008_09_002
S0743731508001767
GroupedDBID --K
--M
-~X
.~1
0R~
1B1
1~.
1~5
29L
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABEFU
ABFNM
ABFSI
ABJNI
ABMAC
ABTAH
ABXDB
ABYKQ
ACDAQ
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADFGL
ADHUB
ADJOM
ADMUD
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CAG
COF
CS3
DM4
DU5
E.L
EBS
EFBJH
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
G8K
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
H~9
IHE
J1W
JJJVA
K-O
KOM
LG5
LG9
LY7
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SET
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
TN5
TWZ
WUQ
XJT
XOL
XPP
ZMT
ZU3
ZY4
~G-
~G0
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
ADVLN
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
AFXIZ
AGCQF
AGRNS
BNPGV
IQODW
SSH
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c427t-45227ae8ba17da26444a466c94a75e7f4631dfd6ba262ca347bfab372f7176683
IEDL.DBID AIKHN
ISSN 0743-7315
IngestDate Sun Sep 28 04:39:32 EDT 2025
Mon Jul 21 09:13:34 EDT 2025
Thu Oct 16 04:37:53 EDT 2025
Thu Apr 24 23:07:34 EDT 2025
Fri Feb 23 02:27:54 EST 2024
IsPeerReviewed true
IsScholarly true
Issue 2
Keywords All-reduce
Tree topology
Collective communication
Cluster of workstations
Lower bound
Optimal algorithm
Symmetric configuration
Calculator cluster
Distributed system
Topology
Multicore processor
Multiprocessor
Ring
Ethernet
Bandwidth
Switching networks
Language English
License CC BY 4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c427t-45227ae8ba17da26444a466c94a75e7f4631dfd6ba262ca347bfab372f7176683
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
PQID 34550640
PQPubID 23500
PageCount 8
ParticipantIDs proquest_miscellaneous_34550640
pascalfrancis_primary_21067440
crossref_primary_10_1016_j_jpdc_2008_09_002
crossref_citationtrail_10_1016_j_jpdc_2008_09_002
elsevier_sciencedirect_doi_10_1016_j_jpdc_2008_09_002
PublicationCentury 2000
PublicationDate 2009-02-01
PublicationDateYYYYMMDD 2009-02-01
PublicationDate_xml – month: 02
  year: 2009
  text: 2009-02-01
  day: 01
PublicationDecade 2000
PublicationPlace Amsterdam
PublicationPlace_xml – name: Amsterdam
PublicationTitle Journal of parallel and distributed computing
PublicationYear 2009
Publisher Elsevier Inc
Elsevier
Publisher_xml – name: Elsevier Inc
– name: Elsevier
References Rabenseifner (b22) 2004; vol. 3036
NCSA Teragrid IA-64 Linux Cluster.
V. Tipparaju, J. Nieplocha, D. Panda, Fast collective operation using shared and remote memory access protocols on clusters, in: Proceedings of the 17th International Symposium on Parallel and Distributed Processing, IPDPS, 2003, p. 84
Patarasuk, Yuan, Faraj (b21) 2008; 68
Open MPI: Open Source High Performance Computing.
Bruck, Coster, Dewulf, Ho, Lauwereins (b6) 1996; 7
Knodel (b15) 1975; 3
Faraj, Patarasuk, Yuan (b7) 2008; 36
W.B. Tan, P. Strazdins, The analysis and optimization of collective communications on a beowulf cluster, in: Proc. of the Ninth International Conference on Parallel and Distributed Systems, ICPADS’02, 2002, p. 659
Bar-Noy, Kipnis, Schieber (b2) 1995; 58
Rabenseifner, Traff (b23) 2004; vol. 3241
Bruck, Ho (b5) 1993; 3
Gropp, Lusk, Doss, Skjellum (b10) 1996; 22
R. Gupta, P. Balaji, D.K. Panda, J. Nieplocha, Efficient collective operations using remote memory operations on VIA-based clusters, in: Proceedings of the 17th International Symposium on Parallel and Distributed Processing, IPDPS, April 2003, p. 46
Lane, Daniels, Yuan (b16) 2007; 64
A. Mamidala, J. Liu, D. Panda, Efficient barrier and allreduce on infiniband clusters using hardware multicast and adaptive algorithms, in: Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2004, pp. 135–144
Thakur, Rabenseifner, Gropp (b25) 2005; 19
Faraj, Patarasuk, Yuan (b9) 2008; 36
W. Gropp, E.L. Lusk, Reproducible measurements of MPI performance characteristics, in: Proceedings of PVMMPI, 1999, pp. 11–18
van de Geijn (b27) 1994; 22
G. Almasi, et al., Optimization of MPI collective communication on BlueGene/L systems, in: International Conference on Supercomputing, ICS, 2005, pp. 253–262
Yuan, Melhem, Gupta (b28) 2003; 14
Bar-Noy, Bruck, Ho, Kipnis, Schieber (b3) 1995; 6
Faraj, Yuan, Patarasuk (b8) 2007; 18
The MPI forum, MPI: A Message-Passing Interface Standard, Version 1.3, May 2008. Available at
L. Bongo, O. Anshus, J. Bjorndalen, T. Larsen, Extending collective operations with application semantics for improving multi-cluster performance, in: Proceedings of the Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models, and Tools for Parallel Computing on Heterogeneous Networks (ISPDC/HeteroPar, 2004, pp. 320–327
Iannello (b13) 1997; 8
Karwande, Yuan, Lowenthal (b14) 2005; 65
Yuan (10.1016/j.jpdc.2008.09.002_b28) 2003; 14
Karwande (10.1016/j.jpdc.2008.09.002_b14) 2005; 65
Gropp (10.1016/j.jpdc.2008.09.002_b10) 1996; 22
Faraj (10.1016/j.jpdc.2008.09.002_b7) 2008; 36
Rabenseifner (10.1016/j.jpdc.2008.09.002_b22) 2004; vol. 3036
10.1016/j.jpdc.2008.09.002_b20
Faraj (10.1016/j.jpdc.2008.09.002_b9) 2008; 36
10.1016/j.jpdc.2008.09.002_b26
10.1016/j.jpdc.2008.09.002_b24
Iannello (10.1016/j.jpdc.2008.09.002_b13) 1997; 8
10.1016/j.jpdc.2008.09.002_b4
Bruck (10.1016/j.jpdc.2008.09.002_b5) 1993; 3
Rabenseifner (10.1016/j.jpdc.2008.09.002_b23) 2004; vol. 3241
10.1016/j.jpdc.2008.09.002_b1
10.1016/j.jpdc.2008.09.002_b11
Bar-Noy (10.1016/j.jpdc.2008.09.002_b2) 1995; 58
Faraj (10.1016/j.jpdc.2008.09.002_b8) 2007; 18
Bruck (10.1016/j.jpdc.2008.09.002_b6) 1996; 7
van de Geijn (10.1016/j.jpdc.2008.09.002_b27) 1994; 22
Lane (10.1016/j.jpdc.2008.09.002_b16) 2007; 64
10.1016/j.jpdc.2008.09.002_b12
10.1016/j.jpdc.2008.09.002_b19
10.1016/j.jpdc.2008.09.002_b18
Patarasuk (10.1016/j.jpdc.2008.09.002_b21) 2008; 68
Bar-Noy (10.1016/j.jpdc.2008.09.002_b3) 1995; 6
10.1016/j.jpdc.2008.09.002_b17
Thakur (10.1016/j.jpdc.2008.09.002_b25) 2005; 19
Knodel (10.1016/j.jpdc.2008.09.002_b15) 1975; 3
References_xml – reference: The MPI forum, MPI: A Message-Passing Interface Standard, Version 1.3, May 2008. Available at
– reference: G. Almasi, et al., Optimization of MPI collective communication on BlueGene/L systems, in: International Conference on Supercomputing, ICS, 2005, pp. 253–262
– volume: 64
  start-page: 210
  year: 2007
  end-page: 228
  ident: b16
  article-title: An empirical study of reliable multicast protocols over ethernet-connected networks
  publication-title: Performance Evaluation Journal
– reference: W. Gropp, E.L. Lusk, Reproducible measurements of MPI performance characteristics, in: Proceedings of PVMMPI, 1999, pp. 11–18
– volume: 6
  start-page: 896
  year: 1995
  end-page: 900
  ident: b3
  article-title: Computing global combine operstions in the multiport postal model
  publication-title: IEEE Transactions on Parallel and Distributed Systems
– reference: A. Mamidala, J. Liu, D. Panda, Efficient barrier and allreduce on infiniband clusters using hardware multicast and adaptive algorithms, in: Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2004, pp. 135–144
– volume: 68
  start-page: 809
  year: 2008
  end-page: 824
  ident: b21
  article-title: Techniques for pipelined broadcast on ethernet switched clusters
  publication-title: Journal of Parallel and Distributed Computing
– volume: 36
  start-page: 543
  year: 2008
  end-page: 570
  ident: b9
  article-title: A study of process arrival patterns for MPI collective operations
  publication-title: International Journal of Parallel Programming
– volume: vol. 3241
  start-page: 36
  year: 2004
  end-page: 46
  ident: b23
  article-title: More efficient reduction algorithms for non-power-of-two number of processors in message-passing parallel systems
  publication-title: EuroPVM/MPI
– volume: 22
  start-page: 324
  year: 1994
  end-page: 328
  ident: b27
  article-title: On global combine operations
  publication-title: Journal of Parallel and Distributed Computing
– volume: vol. 3036
  start-page: 1
  year: 2004
  end-page: 9
  ident: b22
  article-title: Optimization of collective reduction operations
  publication-title: International Conference on Computational Science
– volume: 22
  start-page: 789
  year: 1996
  end-page: 828
  ident: b10
  article-title: A high-performance, portable implementation of the MPI message passing interface standard
  publication-title: Parallel Computing
– reference: W.B. Tan, P. Strazdins, The analysis and optimization of collective communications on a beowulf cluster, in: Proc. of the Ninth International Conference on Parallel and Distributed Systems, ICPADS’02, 2002, p. 659
– reference: V. Tipparaju, J. Nieplocha, D. Panda, Fast collective operation using shared and remote memory access protocols on clusters, in: Proceedings of the 17th International Symposium on Parallel and Distributed Processing, IPDPS, 2003, p. 84
– volume: 3
  start-page: 335
  year: 1993
  end-page: 346
  ident: b5
  article-title: Efficient global combine operations in multi-port message-passing systems
  publication-title: Parallel Processing Letters
– volume: 14
  start-page: 107
  year: 2003
  end-page: 118
  ident: b28
  article-title: Algorithms for supporting compiled communication
  publication-title: IEEE Transactions on Parallel and Distributed Systems
– reference: NCSA Teragrid IA-64 Linux Cluster.
– volume: 19
  start-page: 49
  year: 2005
  end-page: 66
  ident: b25
  article-title: Optimizing of collective communication operations in MPICH
  publication-title: International Journal of High Performance Computing Applications
– volume: 7
  start-page: 256
  year: 1996
  end-page: 265
  ident: b6
  article-title: On the design and implementation of broadcast and global combine operations using the postal model
  publication-title: IEEE Transactions on Parallel and distributed Systems
– reference: Open MPI: Open Source High Performance Computing.
– volume: 8
  start-page: 970
  year: 1997
  end-page: 982
  ident: b13
  article-title: Efficient algorithms for the reduce-scatter operation in LogGP
  publication-title: IEEE Transactions on Parallel and Distributed Systems
– reference: R. Gupta, P. Balaji, D.K. Panda, J. Nieplocha, Efficient collective operations using remote memory operations on VIA-based clusters, in: Proceedings of the 17th International Symposium on Parallel and Distributed Processing, IPDPS, April 2003, p. 46
– volume: 3
  start-page: 95
  year: 1975
  ident: b15
  article-title: New gossips and telephones
  publication-title: Discrete Mathematics
– volume: 36
  start-page: 426
  year: 2008
  end-page: 453
  ident: b7
  article-title: Bandwidth efficient all-to-all broadcast on switched clusters
  publication-title: International Journal of Parallel Programming
– volume: 18
  start-page: 264
  year: 2007
  end-page: 276
  ident: b8
  article-title: A message scheduling scheme for all-to-all personalized communication on ethernet switched clusters
  publication-title: IEEE Transactions on Parallel and Distributed Systems
– volume: 58
  start-page: 213
  year: 1995
  end-page: 222
  ident: b2
  article-title: Optimal computation of census functions in the postal model
  publication-title: Discrete Applied Mathematics
– reference: L. Bongo, O. Anshus, J. Bjorndalen, T. Larsen, Extending collective operations with application semantics for improving multi-cluster performance, in: Proceedings of the Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models, and Tools for Parallel Computing on Heterogeneous Networks (ISPDC/HeteroPar, 2004, pp. 320–327
– volume: 65
  start-page: 1123
  year: 2005
  end-page: 1133
  ident: b14
  article-title: An MPI prototype for compiled communication on ethernet switched clusters
  publication-title: Journal of Parallel and Distributed Computing
– volume: 22
  start-page: 789
  issue: 6
  year: 1996
  ident: 10.1016/j.jpdc.2008.09.002_b10
  article-title: A high-performance, portable implementation of the MPI message passing interface standard
  publication-title: Parallel Computing
  doi: 10.1016/0167-8191(96)00024-5
– volume: 6
  start-page: 896
  issue: 8
  year: 1995
  ident: 10.1016/j.jpdc.2008.09.002_b3
  article-title: Computing global combine operstions in the multiport postal model
  publication-title: IEEE Transactions on Parallel and Distributed Systems
  doi: 10.1109/71.406965
– volume: 36
  start-page: 543
  issue: 6
  year: 2008
  ident: 10.1016/j.jpdc.2008.09.002_b9
  article-title: A study of process arrival patterns for MPI collective operations
  publication-title: International Journal of Parallel Programming
  doi: 10.1007/s10766-008-0070-9
– volume: 36
  start-page: 426
  issue: 4
  year: 2008
  ident: 10.1016/j.jpdc.2008.09.002_b7
  article-title: Bandwidth efficient all-to-all broadcast on switched clusters
  publication-title: International Journal of Parallel Programming
  doi: 10.1007/s10766-007-0047-0
– volume: 65
  start-page: 1123
  issue: 10
  year: 2005
  ident: 10.1016/j.jpdc.2008.09.002_b14
  article-title: An MPI prototype for compiled communication on ethernet switched clusters
  publication-title: Journal of Parallel and Distributed Computing
  doi: 10.1016/j.jpdc.2005.04.014
– ident: 10.1016/j.jpdc.2008.09.002_b20
– volume: 7
  start-page: 256
  issue: 2
  year: 1996
  ident: 10.1016/j.jpdc.2008.09.002_b6
  article-title: On the design and implementation of broadcast and global combine operations using the postal model
  publication-title: IEEE Transactions on Parallel and distributed Systems
  doi: 10.1109/71.491579
– volume: 19
  start-page: 49
  issue: 1
  year: 2005
  ident: 10.1016/j.jpdc.2008.09.002_b25
  article-title: Optimizing of collective communication operations in MPICH
  publication-title: International Journal of High Performance Computing Applications
  doi: 10.1177/1094342005051521
– volume: 14
  start-page: 107
  issue: 2
  year: 2003
  ident: 10.1016/j.jpdc.2008.09.002_b28
  article-title: Algorithms for supporting compiled communication
  publication-title: IEEE Transactions on Parallel and Distributed Systems
  doi: 10.1109/TPDS.2003.1178875
– ident: 10.1016/j.jpdc.2008.09.002_b19
– ident: 10.1016/j.jpdc.2008.09.002_b17
– volume: vol. 3036
  start-page: 1
  year: 2004
  ident: 10.1016/j.jpdc.2008.09.002_b22
  article-title: Optimization of collective reduction operations
– volume: 68
  start-page: 809
  issue: 6
  year: 2008
  ident: 10.1016/j.jpdc.2008.09.002_b21
  article-title: Techniques for pipelined broadcast on ethernet switched clusters
  publication-title: Journal of Parallel and Distributed Computing
  doi: 10.1016/j.jpdc.2007.11.003
– ident: 10.1016/j.jpdc.2008.09.002_b24
– volume: 3
  start-page: 335
  issue: 4
  year: 1993
  ident: 10.1016/j.jpdc.2008.09.002_b5
  article-title: Efficient global combine operations in multi-port message-passing systems
  publication-title: Parallel Processing Letters
  doi: 10.1142/S012962649300037X
– volume: vol. 3241
  start-page: 36
  year: 2004
  ident: 10.1016/j.jpdc.2008.09.002_b23
  article-title: More efficient reduction algorithms for non-power-of-two number of processors in message-passing parallel systems
– volume: 58
  start-page: 213
  year: 1995
  ident: 10.1016/j.jpdc.2008.09.002_b2
  article-title: Optimal computation of census functions in the postal model
  publication-title: Discrete Applied Mathematics
  doi: 10.1016/0166-218X(94)E0148-6
– volume: 3
  start-page: 95
  issue: 1
  year: 1975
  ident: 10.1016/j.jpdc.2008.09.002_b15
  article-title: New gossips and telephones
  publication-title: Discrete Mathematics
  doi: 10.1016/0012-365X(75)90090-4
– ident: 10.1016/j.jpdc.2008.09.002_b26
– ident: 10.1016/j.jpdc.2008.09.002_b18
– ident: 10.1016/j.jpdc.2008.09.002_b11
  doi: 10.1007/3-540-48158-3_2
– volume: 18
  start-page: 264
  issue: 2
  year: 2007
  ident: 10.1016/j.jpdc.2008.09.002_b8
  article-title: A message scheduling scheme for all-to-all personalized communication on ethernet switched clusters
  publication-title: IEEE Transactions on Parallel and Distributed Systems
  doi: 10.1109/TPDS.2007.19
– ident: 10.1016/j.jpdc.2008.09.002_b1
  doi: 10.1145/1088149.1088183
– ident: 10.1016/j.jpdc.2008.09.002_b12
– volume: 8
  start-page: 970
  issue: 9
  year: 1997
  ident: 10.1016/j.jpdc.2008.09.002_b13
  article-title: Efficient algorithms for the reduce-scatter operation in LogGP
  publication-title: IEEE Transactions on Parallel and Distributed Systems
  doi: 10.1109/71.615442
– volume: 64
  start-page: 210
  issue: 3
  year: 2007
  ident: 10.1016/j.jpdc.2008.09.002_b16
  article-title: An empirical study of reliable multicast protocols over ethernet-connected networks
  publication-title: Performance Evaluation Journal
  doi: 10.1016/j.peva.2006.05.013
– ident: 10.1016/j.jpdc.2008.09.002_b4
  doi: 10.1109/ISPDC.2004.24
– volume: 22
  start-page: 324
  issue: 2
  year: 1994
  ident: 10.1016/j.jpdc.2008.09.002_b27
  article-title: On global combine operations
  publication-title: Journal of Parallel and Distributed Computing
  doi: 10.1006/jpdc.1994.1091
SSID ssj0011578
Score 2.4304035
Snippet We consider an efficient realization of the all-reduce operation with large data sizes in cluster environments, under the assumption that the reduce operator...
SourceID proquest
pascalfrancis
crossref
elsevier
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 117
SubjectTerms All-reduce
Applied sciences
Cluster of workstations
Collective communication
Computer science; control theory; systems
Computer systems and distributed systems. User interface
Exact sciences and technology
Software
Tree topology
Title Bandwidth optimal all-reduce algorithms for clusters of workstations
URI https://dx.doi.org/10.1016/j.jpdc.2008.09.002
https://www.proquest.com/docview/34550640
Volume 69
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Complete Freedom Collection [SCCMFC]
  customDbUrl:
  eissn: 1096-0848
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0011578
  issn: 0743-7315
  databaseCode: ACRLP
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection
  customDbUrl:
  eissn: 1096-0848
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0011578
  issn: 0743-7315
  databaseCode: .~1
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals [SCFCJ]
  customDbUrl:
  eissn: 1096-0848
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0011578
  issn: 0743-7315
  databaseCode: AIKHN
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVLSH
  databaseName: Elsevier Journals
  customDbUrl:
  mediaType: online
  eissn: 1096-0848
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0011578
  issn: 0743-7315
  databaseCode: AKRWK
  dateStart: 19840801
  isFulltext: true
  providerName: Library Specific Holdings
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NT9tAEB1BckFCBVoQKSX1obfKJLbXu_YRaKMUBJcWiZs1-2GSKLGj2FFv_e3sxOtICMShN8v2eK3Z3Tdjz8wbgG8q5QGyofIF57HPUq59FFr6PJaJNRAsNJuuJXf3fPzAbh7jxx24bmthKK3SYX-D6Ru0dmcGTpuD5XQ6-E3GT0TEZ05Qy8UudK39SZIOdC9_3Y7vt8GEIG4Amdg4ScDVzjRpXrOlVi6lMt3-XXnDPu0vsbJay5t2F6-Qe2OORofwwfmR3mXzqkewY4qPcND2aPDclv0EP66w0H-nup54pUWHhZXB-dxfEWOrsYdP5WpaTxaVZ51XT83XxJtQeWXuUcJW1cTpq2N4GP38cz32XecEX7FQ1D7RpAs0icRAaCSfhyHjXKUMRWxEzngU6Fxzaa-FCiMmZI4yEmEuiDAyiU6gU5SFOQUvNhhEGBCjp2TGPkAlyBMc5tp-uGmpehC0-sqUoxWn7hbzrM0fm2WkY9fvMs2sjnvwfSuzbEg13r07bqche7E0Mov678r1X8zZdqiQaPMYG_bgazuJmd1UFCnBwpTrKouo1puz4ef_HPoM9pqoE6W9fIFOvVqbc-u81LIPuxf_gr5bos_Nru4N
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT9wwEB4BPYCESktbseWVAzeU7iZx7ORYXtq2wAWQuFnjR7qLlmS1yYobv72ejbMSouLALUrsOBrb34zjz98AHOmcR8gGOhScpyHLuQlRGBXyVGXOQbDYLrKWXF3z4R37fZ_er8BpdxaGaJUe-1tMX6C1v9P31uxPx-P-DTk_kZCeOUEtF6vwgaWxoBXYj-clz4PEZLJOi5OK-5MzLcnrYWq0J1Tmy38r__FOm1Osnc2KNtnFK9xeOKOLT_DRR5HBz_ZDP8OKLbdhq8vQEPgJ-wXOTrA0T2PTjILKYcOjq4OTSTgjvVbrLv9Ws3EzeqwDF7oGejIn1YQ6qIqA6Fp1u0tff4W7i_Pb02Ho8yaEmsWiCUkkXaDNFEbCIEU8DBnnOmcoUisKxpPIFIYr9yzWmDChClSJiAtBcpFZ8g3Wyqq0OxCkFqMEI9LzVMy6F-gMeYaDwrhlm1G6B1FnL6m9qDjltpjIjj32IMnGPttlLp2Ne3C8rDNtJTXeLJ123SBfDAzpMP_Negcv-mzZVEyieYwNenDYdaJ0U4r2SbC01byWCZ305mzw_Z1NH8L68PbqUl7-uv6zCxvt_hMRYPZgrZnN7b4LYxp1sBim_wDI1O7V
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Bandwidth+optimal+all-reduce+algorithms+for+clusters+of+workstations&rft.jtitle=Journal+of+parallel+and+distributed+computing&rft.au=Patarasuk%2C+P&rft.au=Yuan%2C+X&rft.date=2009-02-01&rft.issn=0743-7315&rft.volume=69&rft.issue=2&rft.spage=117&rft.epage=124&rft_id=info:doi/10.1016%2Fj.jpdc.2008.09.002&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0743-7315&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0743-7315&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0743-7315&client=summon