Efficient shared memory and RDMA based collectives on multi-rail QsNetII SMP clusters
Clusters of Symmetric Multiprocessors (SMP) are more commonplace than ever in achieving high-performance. Scientific applications running on clusters employ collective communications extensively. Shared memory communication and Remote Direct Memory Access (RDMA) over multi-rail networks are promisin...
Saved in:
| Published in | Cluster computing Vol. 11; no. 4; pp. 341 - 354 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
Boston
Springer US
01.12.2008
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1386-7857 1573-7543 |
| DOI | 10.1007/s10586-008-0065-8 |
Cover
| Summary: | Clusters of Symmetric Multiprocessors (SMP) are more commonplace than ever in achieving high-performance. Scientific applications running on clusters employ collective communications extensively. Shared memory communication and Remote Direct Memory Access (RDMA) over multi-rail networks are promising approaches in addressing the increasing demand on intra-node and inter-node communications, and thereby in boosting the performance of collectives in emerging multi-core SMP clusters. In this regard, this paper designs and evaluates two classes of collective communication algorithms directly at the Elan user-level over multi-rail Quadrics QsNet
II
with message striping: 1) RDMA-based traditional multi-port algorithms for gather, all-gather, and all-to-all collectives for medium to large messages, and 2) RDMA-based and SMP-aware multi-port all-gather algorithms for small to medium size messages.
The multi-port RDMA-based Direct algorithm for gather and all-to-all collectives gain an improvement of up to 2.15 for 4 KB messages over
elan
_
gather()
, and up to 2.26 for 2 KB messages over
elan
_
alltoall()
, respectively. For the all-gather, our SMP-aware Bruck algorithm outperforms all other all-gather algorithms including
elan
_
gather()
for 512 B to 8 KB messages, with a 1.96 improvement factor for 4 KB messages. Our multi-port Direct all-gather is the best algorithm for 16 KB to 1 MB, and outperforms
elan
_
gather()
by a factor of 1.49 for 32 KB messages. Experimentation with real applications has shown up to 1.47 communication speedup can be achieved using the proposed all-gather algorithms. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1386-7857 1573-7543 |
| DOI: | 10.1007/s10586-008-0065-8 |