Bandwidth optimal all-reduce algorithms for clusters of workstations
We consider an efficient realization of the all-reduce operation with large data sizes in cluster environments, under the assumption that the reduce operator is associative and commutative. We derive a tight lower bound of the amount of data that must be communicated in order to complete this operat...
        Saved in:
      
    
          | Published in | Journal of parallel and distributed computing Vol. 69; no. 2; pp. 117 - 124 | 
|---|---|
| Main Authors | , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        Amsterdam
          Elsevier Inc
    
        01.02.2009
     Elsevier  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 0743-7315 1096-0848  | 
| DOI | 10.1016/j.jpdc.2008.09.002 | 
Cover
| Abstract | We consider an efficient realization of the all-reduce operation with large data sizes in cluster environments, under the assumption that the reduce operator is associative and commutative. We derive a tight lower bound of the amount of data that must be communicated in order to complete this operation and propose a ring-based algorithm that only requires tree connectivity to achieve bandwidth optimality. Unlike the widely used butterfly-like all-reduce algorithm that incurs network contention in SMP/multi-core clusters, the proposed algorithm can achieve contention-free communication in almost all contemporary clusters, including SMP/multi-core clusters and Ethernet switched clusters with multiple switches. We demonstrate that the proposed algorithm is more efficient than other algorithms on clusters with different nodal architectures and networking technologies when the data size is sufficiently large. | 
    
|---|---|
| AbstractList | We consider an efficient realization of the all-reduce operation with large data sizes in cluster environments, under the assumption that the reduce operator is associative and commutative. We derive a tight lower bound of the amount of data that must be communicated in order to complete this operation and propose a ring-based algorithm that only requires tree connectivity to achieve bandwidth optimality. Unlike the widely used butterfly-like all-reduce algorithm that incurs network contention in SMP/multi-core clusters, the proposed algorithm can achieve contention-free communication in almost all contemporary clusters, including SMP /multi-core clusters and Ethernet switched clusters with multiple switches. We demonstrate that the proposed algorithm is more efficient than other algorithms on clusters with different nodal architectures and networking technologies when the data size is sufficiently large. | 
    
| Author | Yuan, Xin Patarasuk, Pitch  | 
    
| Author_xml | – sequence: 1 givenname: Pitch surname: Patarasuk fullname: Patarasuk, Pitch email: patarasu@cs.fsu.edu – sequence: 2 givenname: Xin surname: Yuan fullname: Yuan, Xin email: xyuan@cs.fsu.edu  | 
    
| BackLink | http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=21067440$$DView record in Pascal Francis | 
    
| BookMark | eNp9kD1v2zAQQInCAeq4-QOdtKSblCNFkTKQJU3btICBLslMnPlR06FFh6Rj9N9Xjo0MHTzxAL53wL1LMhniYAn5TKGhQMXNullvjW4YQN_AvAFgH8iUwlzU0PN-QqYgeVvLlnYfyWXOawBKO9lPybevOJi9N2VVxW3xGwwVhlAna3bajuOfmHxZbXLlYqp02OViU66iq_YxPeeCxcchfyIXDkO2V6d3Rp5-fH-8_1kvfj_8ur9b1JozWWreMSbR9kuk0iATnHPkQug5R9lZ6bhoqXFGLMc_prHlculw2UrmJJVC9O2MfDnu3ab4srO5qI3P2oaAg427rFredSA4jOD1CcSsMbiEg_ZZbdN4YPqrGAUh-RvHjpxOMedk3TtCQR3CqrU6hFWHsArmagw7Sv1_kvbHECWhD-fV26Nqx0qv3iaVtbeDtsYnq4sy0Z_T_wFuw5YI | 
    
| CitedBy_id | crossref_primary_10_1016_j_jpdc_2023_104811 crossref_primary_10_1145_3424660 crossref_primary_10_1007_s11227_016_1707_x crossref_primary_10_1109_ACCESS_2024_3384460 crossref_primary_10_1109_MM_2021_3139092 crossref_primary_10_1016_j_future_2020_01_004 crossref_primary_10_1109_TITS_2023_3286400 crossref_primary_10_1109_TPDS_2024_3406420 crossref_primary_10_14778_3598581_3598604 crossref_primary_10_3390_electronics12143021 crossref_primary_10_1109_ACCESS_2020_2999096 crossref_primary_10_1109_JIOT_2023_3250275 crossref_primary_10_1145_3708495 crossref_primary_10_1016_j_jpdc_2020_11_005 crossref_primary_10_1145_3649467 crossref_primary_10_1145_3320060 crossref_primary_10_1007_s10462_024_11036_2 crossref_primary_10_1109_TKDE_2020_3015777 crossref_primary_10_1109_TNET_2020_2999377 crossref_primary_10_1016_j_jpdc_2023_104719 crossref_primary_10_1109_TPDS_2023_3247883 crossref_primary_10_1109_TPDS_2023_3247001 crossref_primary_10_1016_j_sigpro_2021_108245 crossref_primary_10_1021_acs_jpca_1c04587 crossref_primary_10_1016_j_comnet_2021_107846 crossref_primary_10_1007_s42514_023_00150_2 crossref_primary_10_1007_s10586_022_03798_7 crossref_primary_10_1109_ACCESS_2025_3528248 crossref_primary_10_1016_j_neucom_2023_126661 crossref_primary_10_3390_app14125100 crossref_primary_10_1109_ACCESS_2024_3498699 crossref_primary_10_1002_spe_2878 crossref_primary_10_1109_TPDS_2021_3132413 crossref_primary_10_1016_j_sysarc_2024_103180 crossref_primary_10_1109_TNSE_2024_3523320 crossref_primary_10_1002_cpe_5574 crossref_primary_10_1109_MM_2020_3039835 crossref_primary_10_1016_j_comnet_2023_109777 crossref_primary_10_1109_ACCESS_2019_2956775 crossref_primary_10_1007_s41019_022_00202_7 crossref_primary_10_1029_2020MS002064 crossref_primary_10_1016_j_comnet_2022_109191 crossref_primary_10_1016_j_isatra_2023_03_034 crossref_primary_10_1016_j_parco_2021_102871 crossref_primary_10_1109_TNNLS_2021_3084806 crossref_primary_10_1002_cpe_7879 crossref_primary_10_1109_TNET_2023_3244794 crossref_primary_10_2139_ssrn_4072553 crossref_primary_10_1109_TSP_2022_3184770 crossref_primary_10_1109_TCAD_2023_3290128 crossref_primary_10_1109_TPDS_2022_3151739 crossref_primary_10_1007_s42514_019_00018_4 crossref_primary_10_1016_j_hcc_2024_100235 crossref_primary_10_1587_transinf_2020EDP7201 crossref_primary_10_1109_MM_2021_3091475 crossref_primary_10_1109_TNET_2022_3202529 crossref_primary_10_1109_TC_2023_3315847 crossref_primary_10_1016_j_parco_2023_103053 crossref_primary_10_1109_MCAS_2024_3349669 crossref_primary_10_1016_j_neucom_2024_127258 crossref_primary_10_1109_TPDS_2025_3539738 crossref_primary_10_1145_3469379_3469387 crossref_primary_10_1109_COMST_2020_3007787 crossref_primary_10_3390_electronics9030440 crossref_primary_10_1109_MSP_2020_2969859 crossref_primary_10_1109_TCAD_2024_3368970 crossref_primary_10_1016_j_future_2023_10_010 crossref_primary_10_1109_TNET_2024_3404999 crossref_primary_10_1109_TPDS_2023_3331372 crossref_primary_10_1109_TNSE_2024_3419030 crossref_primary_10_1049_trit_2020_0082 crossref_primary_10_1109_LCA_2022_3189207 crossref_primary_10_1007_s42514_020_00046_5 crossref_primary_10_1007_s11390_024_3872_3 crossref_primary_10_1145_3589310 crossref_primary_10_1155_2023_5510329 crossref_primary_10_1016_j_parco_2021_102812 crossref_primary_10_1016_j_osn_2023_100761 crossref_primary_10_1007_s42484_023_00114_3 crossref_primary_10_2139_ssrn_4660550 crossref_primary_10_1038_s41598_023_44541_5 crossref_primary_10_1109_TNSM_2024_3461875 crossref_primary_10_1109_JSAC_2023_3242733 crossref_primary_10_1360_SSI_2023_0051 crossref_primary_10_1145_3377454 crossref_primary_10_1109_JETCAS_2019_2912353 crossref_primary_10_1109_TPDS_2021_3064966 crossref_primary_10_1109_TPDS_2024_3476390 crossref_primary_10_1109_TPDS_2022_3201531 crossref_primary_10_1007_s10462_022_10141_4 crossref_primary_10_1016_j_jpdc_2023_104767 crossref_primary_10_1007_s13042_023_01903_9 crossref_primary_10_1016_j_knosys_2020_106002 crossref_primary_10_1007_s11042_022_12292_6 crossref_primary_10_1109_TPDS_2024_3397800 crossref_primary_10_1016_j_sysarc_2022_102640 crossref_primary_10_3934_mbe_2022044 crossref_primary_10_1007_s12083_014_0253_7 crossref_primary_10_1109_JLT_2021_3073277 crossref_primary_10_1109_TPDS_2020_3040606 crossref_primary_10_1145_3570607 crossref_primary_10_1364_JOCN_499303 crossref_primary_10_1109_TNET_2021_3109097  | 
    
| Cites_doi | 10.1016/0167-8191(96)00024-5 10.1109/71.406965 10.1007/s10766-008-0070-9 10.1007/s10766-007-0047-0 10.1016/j.jpdc.2005.04.014 10.1109/71.491579 10.1177/1094342005051521 10.1109/TPDS.2003.1178875 10.1016/j.jpdc.2007.11.003 10.1142/S012962649300037X 10.1016/0166-218X(94)E0148-6 10.1016/0012-365X(75)90090-4 10.1007/3-540-48158-3_2 10.1109/TPDS.2007.19 10.1145/1088149.1088183 10.1109/71.615442 10.1016/j.peva.2006.05.013 10.1109/ISPDC.2004.24 10.1006/jpdc.1994.1091  | 
    
| ContentType | Journal Article | 
    
| Copyright | 2008 Elsevier Inc. 2009 INIST-CNRS  | 
    
| Copyright_xml | – notice: 2008 Elsevier Inc. – notice: 2009 INIST-CNRS  | 
    
| DBID | AAYXX CITATION IQODW 7SC 8FD JQ2 L7M L~C L~D  | 
    
| DOI | 10.1016/j.jpdc.2008.09.002 | 
    
| DatabaseName | CrossRef Pascal-Francis Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts  Academic Computer and Information Systems Abstracts Professional  | 
    
| DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional  | 
    
| DatabaseTitleList | Computer and Information Systems Abstracts | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Computer Science Applied Sciences  | 
    
| EISSN | 1096-0848 | 
    
| EndPage | 124 | 
    
| ExternalDocumentID | 21067440 10_1016_j_jpdc_2008_09_002 S0743731508001767  | 
    
| GroupedDBID | --K --M -~X .~1 0R~ 1B1 1~. 1~5 29L 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABEFU ABFNM ABFSI ABJNI ABMAC ABTAH ABXDB ABYKQ ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADFGL ADHUB ADJOM ADMUD ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CAG COF CS3 DM4 DU5 E.L EBS EFBJH EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q G8K GBLVA GBOLZ HLZ HVGLF HZ~ H~9 IHE J1W JJJVA K-O KOM LG5 LG9 LY7 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SDF SDG SDP SES SET SEW SPC SPCBC SST SSV SSZ T5K TN5 TWZ WUQ XJT XOL XPP ZMT ZU3 ZY4 ~G- ~G0 AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO ADVLN AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD AFXIZ AGCQF AGRNS BNPGV IQODW SSH 7SC 8FD JQ2 L7M L~C L~D  | 
    
| ID | FETCH-LOGICAL-c427t-45227ae8ba17da26444a466c94a75e7f4631dfd6ba262ca347bfab372f7176683 | 
    
| IEDL.DBID | AIKHN | 
    
| ISSN | 0743-7315 | 
    
| IngestDate | Sun Sep 28 04:39:32 EDT 2025 Mon Jul 21 09:13:34 EDT 2025 Thu Oct 16 04:37:53 EDT 2025 Thu Apr 24 23:07:34 EDT 2025 Fri Feb 23 02:27:54 EST 2024  | 
    
| IsPeerReviewed | true | 
    
| IsScholarly | true | 
    
| Issue | 2 | 
    
| Keywords | All-reduce Tree topology Collective communication Cluster of workstations Lower bound Optimal algorithm Symmetric configuration Calculator cluster Distributed system Topology Multicore processor Multiprocessor Ring Ethernet Bandwidth Switching networks  | 
    
| Language | English | 
    
| License | CC BY 4.0 | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-c427t-45227ae8ba17da26444a466c94a75e7f4631dfd6ba262ca347bfab372f7176683 | 
    
| Notes | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23  | 
    
| PQID | 34550640 | 
    
| PQPubID | 23500 | 
    
| PageCount | 8 | 
    
| ParticipantIDs | proquest_miscellaneous_34550640 pascalfrancis_primary_21067440 crossref_primary_10_1016_j_jpdc_2008_09_002 crossref_citationtrail_10_1016_j_jpdc_2008_09_002 elsevier_sciencedirect_doi_10_1016_j_jpdc_2008_09_002  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2009-02-01 | 
    
| PublicationDateYYYYMMDD | 2009-02-01 | 
    
| PublicationDate_xml | – month: 02 year: 2009 text: 2009-02-01 day: 01  | 
    
| PublicationDecade | 2000 | 
    
| PublicationPlace | Amsterdam | 
    
| PublicationPlace_xml | – name: Amsterdam | 
    
| PublicationTitle | Journal of parallel and distributed computing | 
    
| PublicationYear | 2009 | 
    
| Publisher | Elsevier Inc Elsevier  | 
    
| Publisher_xml | – name: Elsevier Inc – name: Elsevier  | 
    
| References | Rabenseifner (b22) 2004; vol. 3036 NCSA Teragrid IA-64 Linux Cluster. V. Tipparaju, J. Nieplocha, D. Panda, Fast collective operation using shared and remote memory access protocols on clusters, in: Proceedings of the 17th International Symposium on Parallel and Distributed Processing, IPDPS, 2003, p. 84 Patarasuk, Yuan, Faraj (b21) 2008; 68 Open MPI: Open Source High Performance Computing. Bruck, Coster, Dewulf, Ho, Lauwereins (b6) 1996; 7 Knodel (b15) 1975; 3 Faraj, Patarasuk, Yuan (b7) 2008; 36 W.B. Tan, P. Strazdins, The analysis and optimization of collective communications on a beowulf cluster, in: Proc. of the Ninth International Conference on Parallel and Distributed Systems, ICPADS’02, 2002, p. 659 Bar-Noy, Kipnis, Schieber (b2) 1995; 58 Rabenseifner, Traff (b23) 2004; vol. 3241 Bruck, Ho (b5) 1993; 3 Gropp, Lusk, Doss, Skjellum (b10) 1996; 22 R. Gupta, P. Balaji, D.K. Panda, J. Nieplocha, Efficient collective operations using remote memory operations on VIA-based clusters, in: Proceedings of the 17th International Symposium on Parallel and Distributed Processing, IPDPS, April 2003, p. 46 Lane, Daniels, Yuan (b16) 2007; 64 A. Mamidala, J. Liu, D. Panda, Efficient barrier and allreduce on infiniband clusters using hardware multicast and adaptive algorithms, in: Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2004, pp. 135–144 Thakur, Rabenseifner, Gropp (b25) 2005; 19 Faraj, Patarasuk, Yuan (b9) 2008; 36 W. Gropp, E.L. Lusk, Reproducible measurements of MPI performance characteristics, in: Proceedings of PVMMPI, 1999, pp. 11–18 van de Geijn (b27) 1994; 22 G. Almasi, et al., Optimization of MPI collective communication on BlueGene/L systems, in: International Conference on Supercomputing, ICS, 2005, pp. 253–262 Yuan, Melhem, Gupta (b28) 2003; 14 Bar-Noy, Bruck, Ho, Kipnis, Schieber (b3) 1995; 6 Faraj, Yuan, Patarasuk (b8) 2007; 18 The MPI forum, MPI: A Message-Passing Interface Standard, Version 1.3, May 2008. Available at L. Bongo, O. Anshus, J. Bjorndalen, T. Larsen, Extending collective operations with application semantics for improving multi-cluster performance, in: Proceedings of the Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models, and Tools for Parallel Computing on Heterogeneous Networks (ISPDC/HeteroPar, 2004, pp. 320–327 Iannello (b13) 1997; 8 Karwande, Yuan, Lowenthal (b14) 2005; 65 Yuan (10.1016/j.jpdc.2008.09.002_b28) 2003; 14 Karwande (10.1016/j.jpdc.2008.09.002_b14) 2005; 65 Gropp (10.1016/j.jpdc.2008.09.002_b10) 1996; 22 Faraj (10.1016/j.jpdc.2008.09.002_b7) 2008; 36 Rabenseifner (10.1016/j.jpdc.2008.09.002_b22) 2004; vol. 3036 10.1016/j.jpdc.2008.09.002_b20 Faraj (10.1016/j.jpdc.2008.09.002_b9) 2008; 36 10.1016/j.jpdc.2008.09.002_b26 10.1016/j.jpdc.2008.09.002_b24 Iannello (10.1016/j.jpdc.2008.09.002_b13) 1997; 8 10.1016/j.jpdc.2008.09.002_b4 Bruck (10.1016/j.jpdc.2008.09.002_b5) 1993; 3 Rabenseifner (10.1016/j.jpdc.2008.09.002_b23) 2004; vol. 3241 10.1016/j.jpdc.2008.09.002_b1 10.1016/j.jpdc.2008.09.002_b11 Bar-Noy (10.1016/j.jpdc.2008.09.002_b2) 1995; 58 Faraj (10.1016/j.jpdc.2008.09.002_b8) 2007; 18 Bruck (10.1016/j.jpdc.2008.09.002_b6) 1996; 7 van de Geijn (10.1016/j.jpdc.2008.09.002_b27) 1994; 22 Lane (10.1016/j.jpdc.2008.09.002_b16) 2007; 64 10.1016/j.jpdc.2008.09.002_b12 10.1016/j.jpdc.2008.09.002_b19 10.1016/j.jpdc.2008.09.002_b18 Patarasuk (10.1016/j.jpdc.2008.09.002_b21) 2008; 68 Bar-Noy (10.1016/j.jpdc.2008.09.002_b3) 1995; 6 10.1016/j.jpdc.2008.09.002_b17 Thakur (10.1016/j.jpdc.2008.09.002_b25) 2005; 19 Knodel (10.1016/j.jpdc.2008.09.002_b15) 1975; 3  | 
    
| References_xml | – reference: The MPI forum, MPI: A Message-Passing Interface Standard, Version 1.3, May 2008. Available at – reference: G. Almasi, et al., Optimization of MPI collective communication on BlueGene/L systems, in: International Conference on Supercomputing, ICS, 2005, pp. 253–262 – volume: 64 start-page: 210 year: 2007 end-page: 228 ident: b16 article-title: An empirical study of reliable multicast protocols over ethernet-connected networks publication-title: Performance Evaluation Journal – reference: W. Gropp, E.L. Lusk, Reproducible measurements of MPI performance characteristics, in: Proceedings of PVMMPI, 1999, pp. 11–18 – volume: 6 start-page: 896 year: 1995 end-page: 900 ident: b3 article-title: Computing global combine operstions in the multiport postal model publication-title: IEEE Transactions on Parallel and Distributed Systems – reference: A. Mamidala, J. Liu, D. Panda, Efficient barrier and allreduce on infiniband clusters using hardware multicast and adaptive algorithms, in: Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2004, pp. 135–144 – volume: 68 start-page: 809 year: 2008 end-page: 824 ident: b21 article-title: Techniques for pipelined broadcast on ethernet switched clusters publication-title: Journal of Parallel and Distributed Computing – volume: 36 start-page: 543 year: 2008 end-page: 570 ident: b9 article-title: A study of process arrival patterns for MPI collective operations publication-title: International Journal of Parallel Programming – volume: vol. 3241 start-page: 36 year: 2004 end-page: 46 ident: b23 article-title: More efficient reduction algorithms for non-power-of-two number of processors in message-passing parallel systems publication-title: EuroPVM/MPI – volume: 22 start-page: 324 year: 1994 end-page: 328 ident: b27 article-title: On global combine operations publication-title: Journal of Parallel and Distributed Computing – volume: vol. 3036 start-page: 1 year: 2004 end-page: 9 ident: b22 article-title: Optimization of collective reduction operations publication-title: International Conference on Computational Science – volume: 22 start-page: 789 year: 1996 end-page: 828 ident: b10 article-title: A high-performance, portable implementation of the MPI message passing interface standard publication-title: Parallel Computing – reference: W.B. Tan, P. Strazdins, The analysis and optimization of collective communications on a beowulf cluster, in: Proc. of the Ninth International Conference on Parallel and Distributed Systems, ICPADS’02, 2002, p. 659 – reference: V. Tipparaju, J. Nieplocha, D. Panda, Fast collective operation using shared and remote memory access protocols on clusters, in: Proceedings of the 17th International Symposium on Parallel and Distributed Processing, IPDPS, 2003, p. 84 – volume: 3 start-page: 335 year: 1993 end-page: 346 ident: b5 article-title: Efficient global combine operations in multi-port message-passing systems publication-title: Parallel Processing Letters – volume: 14 start-page: 107 year: 2003 end-page: 118 ident: b28 article-title: Algorithms for supporting compiled communication publication-title: IEEE Transactions on Parallel and Distributed Systems – reference: NCSA Teragrid IA-64 Linux Cluster. – volume: 19 start-page: 49 year: 2005 end-page: 66 ident: b25 article-title: Optimizing of collective communication operations in MPICH publication-title: International Journal of High Performance Computing Applications – volume: 7 start-page: 256 year: 1996 end-page: 265 ident: b6 article-title: On the design and implementation of broadcast and global combine operations using the postal model publication-title: IEEE Transactions on Parallel and distributed Systems – reference: Open MPI: Open Source High Performance Computing. – volume: 8 start-page: 970 year: 1997 end-page: 982 ident: b13 article-title: Efficient algorithms for the reduce-scatter operation in LogGP publication-title: IEEE Transactions on Parallel and Distributed Systems – reference: R. Gupta, P. Balaji, D.K. Panda, J. Nieplocha, Efficient collective operations using remote memory operations on VIA-based clusters, in: Proceedings of the 17th International Symposium on Parallel and Distributed Processing, IPDPS, April 2003, p. 46 – volume: 3 start-page: 95 year: 1975 ident: b15 article-title: New gossips and telephones publication-title: Discrete Mathematics – volume: 36 start-page: 426 year: 2008 end-page: 453 ident: b7 article-title: Bandwidth efficient all-to-all broadcast on switched clusters publication-title: International Journal of Parallel Programming – volume: 18 start-page: 264 year: 2007 end-page: 276 ident: b8 article-title: A message scheduling scheme for all-to-all personalized communication on ethernet switched clusters publication-title: IEEE Transactions on Parallel and Distributed Systems – volume: 58 start-page: 213 year: 1995 end-page: 222 ident: b2 article-title: Optimal computation of census functions in the postal model publication-title: Discrete Applied Mathematics – reference: L. Bongo, O. Anshus, J. Bjorndalen, T. Larsen, Extending collective operations with application semantics for improving multi-cluster performance, in: Proceedings of the Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models, and Tools for Parallel Computing on Heterogeneous Networks (ISPDC/HeteroPar, 2004, pp. 320–327 – volume: 65 start-page: 1123 year: 2005 end-page: 1133 ident: b14 article-title: An MPI prototype for compiled communication on ethernet switched clusters publication-title: Journal of Parallel and Distributed Computing – volume: 22 start-page: 789 issue: 6 year: 1996 ident: 10.1016/j.jpdc.2008.09.002_b10 article-title: A high-performance, portable implementation of the MPI message passing interface standard publication-title: Parallel Computing doi: 10.1016/0167-8191(96)00024-5 – volume: 6 start-page: 896 issue: 8 year: 1995 ident: 10.1016/j.jpdc.2008.09.002_b3 article-title: Computing global combine operstions in the multiport postal model publication-title: IEEE Transactions on Parallel and Distributed Systems doi: 10.1109/71.406965 – volume: 36 start-page: 543 issue: 6 year: 2008 ident: 10.1016/j.jpdc.2008.09.002_b9 article-title: A study of process arrival patterns for MPI collective operations publication-title: International Journal of Parallel Programming doi: 10.1007/s10766-008-0070-9 – volume: 36 start-page: 426 issue: 4 year: 2008 ident: 10.1016/j.jpdc.2008.09.002_b7 article-title: Bandwidth efficient all-to-all broadcast on switched clusters publication-title: International Journal of Parallel Programming doi: 10.1007/s10766-007-0047-0 – volume: 65 start-page: 1123 issue: 10 year: 2005 ident: 10.1016/j.jpdc.2008.09.002_b14 article-title: An MPI prototype for compiled communication on ethernet switched clusters publication-title: Journal of Parallel and Distributed Computing doi: 10.1016/j.jpdc.2005.04.014 – ident: 10.1016/j.jpdc.2008.09.002_b20 – volume: 7 start-page: 256 issue: 2 year: 1996 ident: 10.1016/j.jpdc.2008.09.002_b6 article-title: On the design and implementation of broadcast and global combine operations using the postal model publication-title: IEEE Transactions on Parallel and distributed Systems doi: 10.1109/71.491579 – volume: 19 start-page: 49 issue: 1 year: 2005 ident: 10.1016/j.jpdc.2008.09.002_b25 article-title: Optimizing of collective communication operations in MPICH publication-title: International Journal of High Performance Computing Applications doi: 10.1177/1094342005051521 – volume: 14 start-page: 107 issue: 2 year: 2003 ident: 10.1016/j.jpdc.2008.09.002_b28 article-title: Algorithms for supporting compiled communication publication-title: IEEE Transactions on Parallel and Distributed Systems doi: 10.1109/TPDS.2003.1178875 – ident: 10.1016/j.jpdc.2008.09.002_b19 – ident: 10.1016/j.jpdc.2008.09.002_b17 – volume: vol. 3036 start-page: 1 year: 2004 ident: 10.1016/j.jpdc.2008.09.002_b22 article-title: Optimization of collective reduction operations – volume: 68 start-page: 809 issue: 6 year: 2008 ident: 10.1016/j.jpdc.2008.09.002_b21 article-title: Techniques for pipelined broadcast on ethernet switched clusters publication-title: Journal of Parallel and Distributed Computing doi: 10.1016/j.jpdc.2007.11.003 – ident: 10.1016/j.jpdc.2008.09.002_b24 – volume: 3 start-page: 335 issue: 4 year: 1993 ident: 10.1016/j.jpdc.2008.09.002_b5 article-title: Efficient global combine operations in multi-port message-passing systems publication-title: Parallel Processing Letters doi: 10.1142/S012962649300037X – volume: vol. 3241 start-page: 36 year: 2004 ident: 10.1016/j.jpdc.2008.09.002_b23 article-title: More efficient reduction algorithms for non-power-of-two number of processors in message-passing parallel systems – volume: 58 start-page: 213 year: 1995 ident: 10.1016/j.jpdc.2008.09.002_b2 article-title: Optimal computation of census functions in the postal model publication-title: Discrete Applied Mathematics doi: 10.1016/0166-218X(94)E0148-6 – volume: 3 start-page: 95 issue: 1 year: 1975 ident: 10.1016/j.jpdc.2008.09.002_b15 article-title: New gossips and telephones publication-title: Discrete Mathematics doi: 10.1016/0012-365X(75)90090-4 – ident: 10.1016/j.jpdc.2008.09.002_b26 – ident: 10.1016/j.jpdc.2008.09.002_b18 – ident: 10.1016/j.jpdc.2008.09.002_b11 doi: 10.1007/3-540-48158-3_2 – volume: 18 start-page: 264 issue: 2 year: 2007 ident: 10.1016/j.jpdc.2008.09.002_b8 article-title: A message scheduling scheme for all-to-all personalized communication on ethernet switched clusters publication-title: IEEE Transactions on Parallel and Distributed Systems doi: 10.1109/TPDS.2007.19 – ident: 10.1016/j.jpdc.2008.09.002_b1 doi: 10.1145/1088149.1088183 – ident: 10.1016/j.jpdc.2008.09.002_b12 – volume: 8 start-page: 970 issue: 9 year: 1997 ident: 10.1016/j.jpdc.2008.09.002_b13 article-title: Efficient algorithms for the reduce-scatter operation in LogGP publication-title: IEEE Transactions on Parallel and Distributed Systems doi: 10.1109/71.615442 – volume: 64 start-page: 210 issue: 3 year: 2007 ident: 10.1016/j.jpdc.2008.09.002_b16 article-title: An empirical study of reliable multicast protocols over ethernet-connected networks publication-title: Performance Evaluation Journal doi: 10.1016/j.peva.2006.05.013 – ident: 10.1016/j.jpdc.2008.09.002_b4 doi: 10.1109/ISPDC.2004.24 – volume: 22 start-page: 324 issue: 2 year: 1994 ident: 10.1016/j.jpdc.2008.09.002_b27 article-title: On global combine operations publication-title: Journal of Parallel and Distributed Computing doi: 10.1006/jpdc.1994.1091  | 
    
| SSID | ssj0011578 | 
    
| Score | 2.4304035 | 
    
| Snippet | We consider an efficient realization of the all-reduce operation with large data sizes in cluster environments, under the assumption that the reduce operator... | 
    
| SourceID | proquest pascalfrancis crossref elsevier  | 
    
| SourceType | Aggregation Database Index Database Enrichment Source Publisher  | 
    
| StartPage | 117 | 
    
| SubjectTerms | All-reduce Applied sciences Cluster of workstations Collective communication Computer science; control theory; systems Computer systems and distributed systems. User interface Exact sciences and technology Software Tree topology  | 
    
| Title | Bandwidth optimal all-reduce algorithms for clusters of workstations | 
    
| URI | https://dx.doi.org/10.1016/j.jpdc.2008.09.002 https://www.proquest.com/docview/34550640  | 
    
| Volume | 69 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Complete Freedom Collection [SCCMFC] customDbUrl: eissn: 1096-0848 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0011578 issn: 0743-7315 databaseCode: ACRLP dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection customDbUrl: eissn: 1096-0848 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0011578 issn: 0743-7315 databaseCode: .~1 dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals [SCFCJ] customDbUrl: eissn: 1096-0848 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0011578 issn: 0743-7315 databaseCode: AIKHN dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVLSH databaseName: Elsevier Journals customDbUrl: mediaType: online eissn: 1096-0848 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0011578 issn: 0743-7315 databaseCode: AKRWK dateStart: 19840801 isFulltext: true providerName: Library Specific Holdings  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NT9tAEB1BckFCBVoQKSX1obfKJLbXu_YRaKMUBJcWiZs1-2GSKLGj2FFv_e3sxOtICMShN8v2eK3Z3Tdjz8wbgG8q5QGyofIF57HPUq59FFr6PJaJNRAsNJuuJXf3fPzAbh7jxx24bmthKK3SYX-D6Ru0dmcGTpuD5XQ6-E3GT0TEZ05Qy8UudK39SZIOdC9_3Y7vt8GEIG4Amdg4ScDVzjRpXrOlVi6lMt3-XXnDPu0vsbJay5t2F6-Qe2OORofwwfmR3mXzqkewY4qPcND2aPDclv0EP66w0H-nup54pUWHhZXB-dxfEWOrsYdP5WpaTxaVZ51XT83XxJtQeWXuUcJW1cTpq2N4GP38cz32XecEX7FQ1D7RpAs0icRAaCSfhyHjXKUMRWxEzngU6Fxzaa-FCiMmZI4yEmEuiDAyiU6gU5SFOQUvNhhEGBCjp2TGPkAlyBMc5tp-uGmpehC0-sqUoxWn7hbzrM0fm2WkY9fvMs2sjnvwfSuzbEg13r07bqche7E0Mov678r1X8zZdqiQaPMYG_bgazuJmd1UFCnBwpTrKouo1puz4ef_HPoM9pqoE6W9fIFOvVqbc-u81LIPuxf_gr5bos_Nru4N | 
    
| linkProvider | Elsevier | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT9wwEB4BPYCESktbseWVAzeU7iZx7ORYXtq2wAWQuFnjR7qLlmS1yYobv72ejbMSouLALUrsOBrb34zjz98AHOmcR8gGOhScpyHLuQlRGBXyVGXOQbDYLrKWXF3z4R37fZ_er8BpdxaGaJUe-1tMX6C1v9P31uxPx-P-DTk_kZCeOUEtF6vwgaWxoBXYj-clz4PEZLJOi5OK-5MzLcnrYWq0J1Tmy38r__FOm1Osnc2KNtnFK9xeOKOLT_DRR5HBz_ZDP8OKLbdhq8vQEPgJ-wXOTrA0T2PTjILKYcOjq4OTSTgjvVbrLv9Ws3EzeqwDF7oGejIn1YQ6qIqA6Fp1u0tff4W7i_Pb02Ho8yaEmsWiCUkkXaDNFEbCIEU8DBnnOmcoUisKxpPIFIYr9yzWmDChClSJiAtBcpFZ8g3Wyqq0OxCkFqMEI9LzVMy6F-gMeYaDwrhlm1G6B1FnL6m9qDjltpjIjj32IMnGPttlLp2Ne3C8rDNtJTXeLJ123SBfDAzpMP_Negcv-mzZVEyieYwNenDYdaJ0U4r2SbC01byWCZ305mzw_Z1NH8L68PbqUl7-uv6zCxvt_hMRYPZgrZnN7b4LYxp1sBim_wDI1O7V | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Bandwidth+optimal+all-reduce+algorithms+for+clusters+of+workstations&rft.jtitle=Journal+of+parallel+and+distributed+computing&rft.au=Patarasuk%2C+P&rft.au=Yuan%2C+X&rft.date=2009-02-01&rft.issn=0743-7315&rft.volume=69&rft.issue=2&rft.spage=117&rft.epage=124&rft_id=info:doi/10.1016%2Fj.jpdc.2008.09.002&rft.externalDBID=NO_FULL_TEXT | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0743-7315&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0743-7315&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0743-7315&client=summon |