Optimizing the location of ECC protection in Network-on-Chip

The communication in Network-on-Chips (NoCs) may be subject to errors. Error Correcting Codes (ECCs) can be used to tolerate the transient faults in flits caused by Single Event Upsets (SEU). ECC can improve the reliability of a NoC significantly at the cost of extra area and power consumption. Howe...

Full description

Saved in:
Bibliographic Details
Published inCODES+ISSS : 2016 International Conference on Hardware/Software Codesign and System Synthesis : October 2-7, 2016, Pittsburgh Marriott City Center, Pittsburgh, PA pp. 1 - 10
Main Authors Junshi Wang, Letian Huang, Qiang Li, Guangjun Li, Jantsch, Axel
Format Conference Proceeding
LanguageEnglish
Published ACM 01.10.2016
Subjects
Online AccessGet full text
DOI10.1145/2968456.2968460

Cover

More Information
Summary:The communication in Network-on-Chips (NoCs) may be subject to errors. Error Correcting Codes (ECCs) can be used to tolerate the transient faults in flits caused by Single Event Upsets (SEU). ECC can improve the reliability of a NoC significantly at the cost of extra area and power consumption. However, ECC units (encoders and decoders) may also suffer from SEU faults and thus may lead to over-protection, meaning that providing more ECC units does not further improve reliability. This work analyzes reliability in NoCs, i.e. fractions of correctly received flits, considering the SEU errors introduced by both protected circuits and ECC units. The results show the potential for over-protection. Based on this analysis, we maximize the protection by optimizing the location of the ECC units. We study the reliability of an 8 × 8 Mesh NoC with six ECC protection strategies, and we conclude that one protection strategy called SLOPE achieves the best trade-off among the six examined strategies by considering reliability, latency, area, energy consumption and design space comprehensively.
DOI:10.1145/2968456.2968460