Per-packet global congestion estimation for fast packet delivery in networks-on-chip
Networks-on-chip (NOCs) are becoming the de facto communication fabric to connect cores and cache banks in chip multiprocessors (CMPs). Routing algorithms, as one of the key components that influence NOC latency, are the subject of extensive research. Static routing algorithms have low cost but unli...
Saved in:
| Published in | The Journal of supercomputing Vol. 71; no. 9; pp. 3419 - 3439 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | English |
| Published |
New York
Springer US
01.09.2015
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0920-8542 1573-0484 |
| DOI | 10.1007/s11227-015-1439-3 |
Cover
| Summary: | Networks-on-chip (NOCs) are becoming the de facto communication fabric to connect cores and cache banks in chip multiprocessors (CMPs). Routing algorithms, as one of the key components that influence NOC latency, are the subject of extensive research. Static routing algorithms have low cost but unlike adaptive routing algorithms, do not perform well under non-uniform or bursty traffic. Adaptive routing algorithms estimate congestion levels of output ports to avoid routing traffic over congested ports. As global adaptive routing algorithms are not restricted to local information for congestion estimation, they are the prime candidates for balancing traffic in NOCs. Unfortunately, destinations of packets are not considered for congestion estimation in existing global adaptive routing algorithms. We will show that having identical congestion estimates for packets with different destinations prevents global adaptive routing algorithms from reaching their peak potential. In this work, we introduce
Fast
, a low-cost global adaptive routing algorithm that estimates congestion levels of output ports on a per-packet basis. The simulation results reveal that
Fast
achieves lower latency and higher throughput as compared to those of other adaptive routing algorithms across all workloads examined.
Fast
increases the throughput of an
8
×
8
network by 54, 30, and 16 % as compared to DOR, Local, and RCA on a synthetic traffic profile. On realistic benchmarks,
Fast
achieves 5 % average, and 12 % maximum latency reduction on SPLASH-2 benchmarks running on a 49-core CMP as compared to the state of the art. |
|---|---|
| ISSN: | 0920-8542 1573-0484 |
| DOI: | 10.1007/s11227-015-1439-3 |