PIAR: Path-Improved Adaptive Routing for Dragonfly Networks
For the next-generation exascale supercomputing communication systems, Dragonfly topology offers strong scalability, low latency, and cost efficiency. Dragonfly networks have already been implemented in current supercomputers and will continue to expand in future systems. Adaptive routing in Dragonf...
Saved in:
| Published in | Proceedings / IEEE International Conference on Cluster Computing pp. 1 - 11 |
|---|---|
| Main Authors | , , , , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
02.09.2025
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2168-9253 |
| DOI | 10.1109/CLUSTER59342.2025.11186482 |
Cover
| Summary: | For the next-generation exascale supercomputing communication systems, Dragonfly topology offers strong scalability, low latency, and cost efficiency. Dragonfly networks have already been implemented in current supercomputers and will continue to expand in future systems. Adaptive routing in Dragonfly topologies is critical for network performance. The traditional UGAL routing algorithm, which uses the valiant mechanism to select non-minimal paths, does not adequately consider the impact of high hops in non-minimal paths, often unnecessarily increasing the average path length, thereby increasing network latency and load. Furthermore, UGAL inaccurately estimates the congestion of the entire routing path based on local information, leading to suboptimal routing decisions that limit the algorithm's performance. In this paper, we propose PIAR, a novel pathimproved adaptive routing algorithm. PIAR dynamically selects paths based on the status of local and global channels, prioritizing non-minimal paths with fewer hops to reduce network latency and load, thereby improving network performance. Additionally, we present the microarchitecture of the routing computation unit. Our evaluation results demonstrate that, compared with advanced algorithms such as PAR _{\text {PH }} , TPR, and UGAL LE, PIAR achieves an average throughput improvement of 19.2 % and reduces latency by up to \mathbf{1 3. 4 \%} under the single synthetic traffic. Under mixed traffic, PIAR achieves an average throughput improvement of \mathbf{2 3. 6 \%} and reduces the latency by up to \mathbf{3 3. 8 \%} . For application workloads, PIAR achieves an average reduction of 24.0 % in packet latency. |
|---|---|
| ISSN: | 2168-9253 |
| DOI: | 10.1109/CLUSTER59342.2025.11186482 |