Vicis a reliable network for unreliable silicon

Process scaling has given designers billions of transistors to work with. As feature sizes near the atomic scale, extensive variation and wearout inevitably make margining uneconomical or impossible. The ElastIC project seeks to address this by creating a large-scale chip-multiprocessor that can sel...

Full description

Saved in:
Bibliographic Details
Published in2009 46th ACM/IEEE Design Automation Conference pp. 812 - 817
Main Authors Fick, David, DeOrio, Andrew, Hu, Jin, Bertacco, Valeria, Blaauw, David, Sylvester, Dennis
Format Conference Proceeding
LanguageEnglish
Published New York, NY, USA ACM 26.07.2009
IEEE
SeriesACM Conferences
Subjects
Online AccessGet full text
ISBN9781605584973
1605584975
ISSN0738-100X
DOI10.1145/1629911.1630119

Cover

Abstract Process scaling has given designers billions of transistors to work with. As feature sizes near the atomic scale, extensive variation and wearout inevitably make margining uneconomical or impossible. The ElastIC project seeks to address this by creating a large-scale chip-multiprocessor that can self-diagnose, adapt, and heal. Creating large, flexible designs in this environment naturally lends itself to the repetitive nature of network-on-chip (NoC), but the loss of a single link or router will result in complete network failure. In this work we present Vicis, an ElastIC-style NoC that can tolerate the loss of many network components due to wearout induced hard faults. Vicis uses the inherent redundancy in the network and its routers in order to maintain correct operation while incurring a much lower area overhead than previously proposed N-modular redundancy (NMR) based solutions. Each router has a built-in-self-test (BIST) that diagnoses the locations of hard fault and runs a number of algorithms to best use ECC, port swapping, and a crossbar bypass bus to mitigate them. The routers work together to run distributed algorithms to solve network-wide problems as well, protecting the networking against critical failures in individual routers. In this work we show that with stuck-at fault rates as high as 1 in 2000 gates, Vicis will continue to operate with approximately half of its routers still functional and communicating.
AbstractList Process scaling has given designers billions of transistors to work with. As feature sizes near the atomic scale, extensive variation and wearout inevitably make margining uneconomical or impossible. The ElastIC project seeks to address this by creating a large-scale chip-multiprocessor that can self-diagnose, adapt, and heal. Creating large, flexible designs in this environment naturally lends itself to the repetitive nature of network-on-chip (NoC), but the loss of a single link or router will result in complete network failure. In this work we present Vicis, an ElastIC-style NoC that can tolerate the loss of many network components due to wearout induced hard faults. Vicis uses the inherent redundancy in the network and its routers in order to maintain correct operation while incurring a much lower area overhead than previously proposed N-modular redundancy (NMR) based solutions. Each router has a built-in-self-test (BIST) that diagnoses the locations of hard fault and runs a number of algorithms to best use ECC, port swapping, and a crossbar bypass bus to mitigate them. The routers work together to run distributed algorithms to solve network-wide problems as well, protecting the networking against critical failures in individual routers. In this work we show that with stuck-at fault rates as high as 1 in 2000 gates, Vicis will continue to operate with approximately half of its routers still functional and communicating.
Author DeOrio, Andrew
Bertacco, Valeria
Sylvester, Dennis
Fick, David
Blaauw, David
Hu, Jin
Author_xml – sequence: 1
  givenname: David
  surname: Fick
  fullname: Fick, David
  organization: University of Michigan, Ann Arbor, MI
– sequence: 2
  givenname: Andrew
  surname: DeOrio
  fullname: DeOrio, Andrew
  organization: University of Michigan, Ann Arbor, MI
– sequence: 3
  givenname: Jin
  surname: Hu
  fullname: Hu, Jin
  organization: University of Michigan, Ann Arbor, MI
– sequence: 4
  givenname: Valeria
  surname: Bertacco
  fullname: Bertacco, Valeria
  organization: University of Michigan, Ann Arbor, MI
– sequence: 5
  givenname: David
  surname: Blaauw
  fullname: Blaauw, David
  organization: University of Michigan, Ann Arbor, MI
– sequence: 6
  givenname: Dennis
  surname: Sylvester
  fullname: Sylvester, Dennis
  organization: University of Michigan, Ann Arbor, MI
BookMark eNqNj7FOwzAURS3RSC0lMwM_wJLwnv3sZ4-oAopUiYVWbFbs2FKANihh4e8Jaj6A6Q7n3iudS7E49ackxDVCjUj6Do10DrFGowDRXYjSsUUDWltyrBZiBaxshQBvhVhabYicpKUox_EdYJowsdUrURy62I1XosjN55jKOddi__jwutlWu5en5839rmok8XdlHVNuc6bAFjhoha2ObRtIRxkjmyZbFcC1SQUGg5IMZxOcCjYqDS6qtbg5_3YpJf81dMdm-PFaSgatJnp7pk08-tD3H6NH8H-6ftb1s-5Urf9Z9WHoUla_ejxPnw
ContentType Conference Proceeding
Copyright 2009 ACM
Copyright_xml – notice: 2009 ACM
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1145/1629911.1630119
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
Engineering
EndPage 817
ExternalDocumentID 5227053
Genre orig-research
GroupedDBID 6IE
6IF
6IG
6IH
6IK
6IL
6IM
6IN
AAJGR
AARBI
ACM
ADPZR
ALMA_UNASSIGNED_HOLDINGS
APO
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
GUFHI
IERZE
OCL
RIE
RIL
RIO
123
29O
AAWTH
ACGFS
ADZIZ
CHZPO
IEGSK
IJVOP
IPLJI
M43
RNS
ID FETCH-LOGICAL-a247t-8974fdff4b7807b531d5cddb45c2cc76af83b09de3b70612467f6b93b8c3509c3
IEDL.DBID RIE
ISBN 9781605584973
1605584975
ISSN 0738-100X
IngestDate Wed Aug 27 02:18:45 EDT 2025
Wed Jan 31 06:44:09 EST 2024
Wed Jan 31 06:39:35 EST 2024
IsPeerReviewed false
IsScholarly true
Keywords fault tolerance
torus
Network-on-Chip
hard faults
N-modular redundancy
reconfiguration
built-in-self-test
LCCN 85644924
Language English
License Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org
LinkModel DirectLink
MeetingName DAC '09: The 46th Annual Design Automation Conference 2009
MergedId FETCHMERGED-LOGICAL-a247t-8974fdff4b7807b531d5cddb45c2cc76af83b09de3b70612467f6b93b8c3509c3
PageCount 6
ParticipantIDs ieee_primary_5227053
acm_books_10_1145_1629911_1630119_brief
acm_books_10_1145_1629911_1630119
PublicationCentury 2000
PublicationDate 20090726
2009-July
PublicationDateYYYYMMDD 2009-07-26
2009-07-01
PublicationDate_xml – month: 07
  year: 2009
  text: 20090726
  day: 26
PublicationDecade 2000
PublicationPlace New York, NY, USA
PublicationPlace_xml – name: New York, NY, USA
PublicationSeriesTitle ACM Conferences
PublicationTitle 2009 46th ACM/IEEE Design Automation Conference
PublicationTitleAbbrev DAC
PublicationYear 2009
Publisher ACM
IEEE
Publisher_xml – name: ACM
– name: IEEE
SSID ssj0001174785
ssj0004161
Score 2.1783345
Snippet Process scaling has given designers billions of transistors to work with. As feature sizes near the atomic scale, extensive variation and wearout inevitably...
SourceID ieee
acm
SourceType Publisher
StartPage 812
SubjectTerms Built-in-Self-Test
Electric breakdown
Fault diagnosis
Fault tolerance
Hard Faults
N-Modular Redundancy
Network-on-a-chip
Network-on-Chip
Power system management
Reconfiguration
Redundancy
Silicon
System recovery
Telecommunication traffic
Testing
Torus
Subtitle a reliable network for unreliable silicon
Title Vicis
URI https://ieeexplore.ieee.org/document/5227053
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH7MnfQy3SbOX0QQvNiuXZKm9SbiGMLEg5PdSpMmUJydrO3Fv96kzTYVQW9NyCF9vLz3JXnfF4BLknCPCcIdJgR1CIl8J0lp5EQyUWqkvdpLDTl5-hhMZuRhTuctuN5wYaSUdfGZdM1nfZefLkVljsqGGisw7TQ7sMPCoOFqbc9TfKMEv4W-BrjXEpzYyJd6c0Pq0tBd59uIUav1tG5jq_njEzr0Ax2gfd_VOMXIoZmkJd6-Pb1SZ55xB6brOTcFJ69uVXJXfPyQc_zvT-1Df8vxQ0-b7HUALZl3oWNBKbJLvujC3hfFwh54L5nIiht0i1ZykRnaFcqbQnKk0S-q8k13kS20k-V9mI3vn-8mjn10wUlGhJVOqDcYKlWKcBZ6jOslmlKRppxQMRKCBYkKMfeiVGLODDzSgVYFPMI8FFiDD4EPoZ0vc3kESCiOI0V8xrAgUmAdSpmKaEKVAU0BG8CFtm1sdhNF3BCkaWztH1v7D-DqzzExX2VSDaBnjBu_NyodsbXr8e_dJ7Db3AuZwttTaJerSp5peFHy89qvPgHt8MGT
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFH5BPKgXFDDiz5mYeHFjo-26eTNGggrEAxhuy9q1ySIOw7aLf73tNkGNid7Wpofu5fW9r-37vgJc4JDZlGNmUs6JibHvmGFEfNMXoZQ95dV2pMnJo7E7mOKHGZnV4GrFhRFCFMVnwtKfxV1-tOC5PirrKqxAldNswCbBGJOSrbU-UXG0Fvwa_GroXohwIi1gas80rUuBd5VxfUoqtafPNqpUfxxMuo6rQrTjWAqpaEE0nbb467fHV4rc02_A6HPWZcnJi5VnzOLvPwQd__tbu9Bes_yMp1X-2oOaSJrQqGCpUS36tAk7XzQLW2A_xzxOr40bYynmsSZeGUlZSm4o_Gvkyao7jefKzZI2TPt3k9uBWT27YIY9TDPTU1sMGUmJGfVsytQijQiPIoYJ73FO3VB6iNl-JBCjGiCpUCtd5iPmcaTgB0f7UE8WiTgAg0uGfIkdShHHgiMVTKn0SUikhk0u7cC5sm2g9xNpUFKkSVDZP6js34HLP8cEbBkL2YGWNm7wVup0BJVdD3_vPoOtwWQ0DIb348cj2C5viXQZ7jHUs2UuThTYyNhp4WMfS6jE4A
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2009+46th+ACM%2FIEEE+Design+Automation+Conference&rft.atitle=Vicis%3A+A+reliable+network+for+unreliable+silicon&rft.au=Fick%2C+D.&rft.au=DeOrio%2C+A.&rft.au=Jin+Hu&rft.au=Bertacco%2C+V.&rft.date=2009-07-01&rft.pub=IEEE&rft.isbn=9781605584973&rft.issn=0738-100X&rft.spage=812&rft.epage=817&rft_id=info:doi/10.1145%2F1629911.1630119&rft.externalDocID=5227053
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0738-100X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0738-100X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0738-100X&client=summon