Resource Conscious Diagnosis and Reconfiguration for NoC Permanent Faults
Networks-on-chip (NoCs) have been increasingly adopted in recent years due to the extensive integration of many components in modern multicore processors and system-on-chip designs. At the same time, transistor reliability is becoming a major concern due to the continuous scaling of silicon. As the...
Saved in:
| Published in | IEEE transactions on computers Vol. 65; no. 7; pp. 2241 - 2256 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
New York
IEEE
01.07.2016
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0018-9340 1557-9956 |
| DOI | 10.1109/TC.2015.2479586 |
Cover
| Summary: | Networks-on-chip (NoCs) have been increasingly adopted in recent years due to the extensive integration of many components in modern multicore processors and system-on-chip designs. At the same time, transistor reliability is becoming a major concern due to the continuous scaling of silicon. As the sole medium of on-chip communication, it is critical for a NoC to be able to tolerate many permanent transistor failures. In this paper, we propose uDIREC, a unified framework for permanent fault diagnosis and subsequent reconfiguration in NoCs, which provides graceful performance degradation with an increasing number of faults. Upon in-field transistor failures, uDIREC leverages a fine-resolution diagnosis mechanism to disable faulty components very sparingly. At its core, uDIREC employs MOUNT, a novel routing algorithm to find reliable and deadlock-free routes that utilize all the still-functional links in the NoC. We implement uDIREC's reconfiguration as a truly-distributed hardware solution, still keeping the area overhead at a minimum. We also propose a software-implemented reconfiguration that provides greater integration with our software-based diagnosis scheme, at the cost of distributed nature of implementation. Regardless of the adopted implementation scheme, uDIREC places no restriction on topology, router architecture and number and location of faults. Experimental results show that uDIREC, implemented in a 64-node NoC, drops 3<inline-formula> <tex-math notation="LaTeX">\times</tex-math> <inline-graphic xlink:type="simple" xlink:href="parikh-ieq1-2479586.gif"/> </inline-formula> fewer nodes and provides greater than 25 percent throughput improvement (beyond 15 faults) when compared to other state-of-the-art fault-tolerance solutions. uDIREC's improvement over prior-art grows further with more faults, making it a effective NoC reliability solution for a wide range of fault rates. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 0018-9340 1557-9956 |
| DOI: | 10.1109/TC.2015.2479586 |