Checkpoint/Restart and Beyond: Resilient High Performance Computing with FPGAs
As FPGA resources continue to increase, FPGAs present attractive features to the High Performance Computing community. These include the power-efficient computation and application-specific acceleration benefits, as well as tighter integration between compute and I/O resources. This paper considers...
        Saved in:
      
    
          | Published in | 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines pp. 162 - 169 | 
|---|---|
| Main Authors | , , , | 
| Format | Conference Proceeding | 
| Language | English | 
| Published | 
            IEEE
    
        01.05.2011
     | 
| Subjects | |
| Online Access | Get full text | 
| ISBN | 9781612842776 1612842771  | 
| DOI | 10.1109/FCCM.2011.22 | 
Cover
| Abstract | As FPGA resources continue to increase, FPGAs present attractive features to the High Performance Computing community. These include the power-efficient computation and application-specific acceleration benefits, as well as tighter integration between compute and I/O resources. This paper considers the ability of an FPGA to address another, increasingly important, feature - resiliency. Specifically, a minimally-invasive monitoring infrastructure operating over a sideband network is presented. This includes a multi-chip protocol, IP cores that implement the protocol, and a tool to instrument existing hardware accelerator FPGA designs. To demonstrate the functionality, the system has been implemented on a cluster of FPGA devices running off-the-shelf MPI and Linux. We demonstrate the ability to do integrated software and hardware accelerator check pointing with restart under a variety of injected faults. | 
    
|---|---|
| AbstractList | As FPGA resources continue to increase, FPGAs present attractive features to the High Performance Computing community. These include the power-efficient computation and application-specific acceleration benefits, as well as tighter integration between compute and I/O resources. This paper considers the ability of an FPGA to address another, increasingly important, feature - resiliency. Specifically, a minimally-invasive monitoring infrastructure operating over a sideband network is presented. This includes a multi-chip protocol, IP cores that implement the protocol, and a tool to instrument existing hardware accelerator FPGA designs. To demonstrate the functionality, the system has been implemented on a cluster of FPGA devices running off-the-shelf MPI and Linux. We demonstrate the ability to do integrated software and hardware accelerator check pointing with restart under a variety of injected faults. | 
    
| Author | Bin Huang Schmidt, A G Sass, R French, M  | 
    
| Author_xml | – sequence: 1 givenname: A G surname: Schmidt fullname: Schmidt, A G email: andrewgschmidt@gmail.com organization: Reconfigurable Comput. Syst. Lab., Univ. of North Carolina at Charlotte, Charlotte, NC, USA – sequence: 2 surname: Bin Huang fullname: Bin Huang email: bhuang2@uncc.edu organization: Reconfigurable Comput. Syst. Lab., Univ. of North Carolina at Charlotte, Charlotte, NC, USA – sequence: 3 givenname: R surname: Sass fullname: Sass, R email: rsass@uncc.edu organization: Reconfigurable Comput. Syst. Lab., Univ. of North Carolina at Charlotte, Charlotte, NC, USA – sequence: 4 givenname: M surname: French fullname: French, M email: mfrench@isi.edu organization: Inf. Sci. Inst., Univ. of Southern California, Arlington, SC, USA  | 
    
| BookMark | eNotjsFOAjEURWvUREF27tz0Bxj62unr1B1OBExQidE16UzfQBU6ZKbG8PeS6N2c5CxO7oBdxDYSY7cgMgBhJ7OyfM6kAMikPGMDYdDqXAnIz9nImgIQZJFLY_CKjfr-U5yGaBXiNXspt1R_HdoQ0-SN-uS6xF30_IGObfT3_OTCLlBMfBE2W76irmm7vYs18bLdH75TiBv-E9KWz1bzaX_DLhu362n0zyH7mD2-l4vx8nX-VE6X4wBGp7GuvawIZeGgVlXt0WsjTo-MrcGRzrUXlUUpKgQHhqzySqAXIBvnjQSthuzurxuIaH3owt51x7U2BiQW6hcI3E9L | 
    
| ContentType | Conference Proceeding | 
    
| DBID | 6IE 6IL CBEJK RIE RIL  | 
    
| DOI | 10.1109/FCCM.2011.22 | 
    
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings Accès ENAC - IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present  | 
    
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| EISBN | 0769543014 9780769543017  | 
    
| EndPage | 169 | 
    
| ExternalDocumentID | 5771268 | 
    
| Genre | orig-research | 
    
| GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ADFMO ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IEGSK IERZE OCL RIE RIL  | 
    
| ID | FETCH-LOGICAL-i175t-5cd2be628a1c3bcd6d57069379c1ae545d0b9620b61a17e93d306d012fad72153 | 
    
| IEDL.DBID | RIE | 
    
| ISBN | 9781612842776 1612842771  | 
    
| IngestDate | Wed Aug 27 02:53:36 EDT 2025 | 
    
| IsPeerReviewed | false | 
    
| IsScholarly | false | 
    
| Language | English | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-i175t-5cd2be628a1c3bcd6d57069379c1ae545d0b9620b61a17e93d306d012fad72153 | 
    
| PageCount | 8 | 
    
| ParticipantIDs | ieee_primary_5771268 | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2011-May | 
    
| PublicationDateYYYYMMDD | 2011-05-01 | 
    
| PublicationDate_xml | – month: 05 year: 2011 text: 2011-May  | 
    
| PublicationDecade | 2010 | 
    
| PublicationTitle | 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines | 
    
| PublicationTitleAbbrev | fccm | 
    
| PublicationYear | 2011 | 
    
| Publisher | IEEE | 
    
| Publisher_xml | – name: IEEE | 
    
| SSID | ssj0000669366 ssib026766457  | 
    
| Score | 1.5545647 | 
    
| Snippet | As FPGA resources continue to increase, FPGAs present attractive features to the High Performance Computing community. These include the power-efficient... | 
    
| SourceID | ieee | 
    
| SourceType | Publisher | 
    
| StartPage | 162 | 
    
| SubjectTerms | Amplitude modulation Checkpoint Restart Context Field programmable gate arrays FPGA Hardware High Performance Computing Monitoring Reconfigurable Computing Registers Resiliency Software  | 
    
| Title | Checkpoint/Restart and Beyond: Resilient High Performance Computing with FPGAs | 
    
| URI | https://ieeexplore.ieee.org/document/5771268 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF7anjyptOKbPXg0NZvHpPEmwViEliIWeiv7mGCppEXTi7_e2WzainjwthnIsjv7mJ3HN8PYjc365yMdwCAhFSWKNHgyLpSXxEYWpAWlAVqP7mgMw2n0PItnLXa7w8IgYh18hn3brH35ZqU31lRGynsiAhi0WTsZgMNqbfdOAAlA1Hjw3C0MaQhgsVxg72Aai2hSPG2_YRcIn97lWTZyCT1tGd0fhVZqOZMfstF2hC68ZNnfVKqvv34lb_zvFI5Yb4_o45OdrDpmLSy7bJy9oV6uVwv644WkA-0iLkvDHarlnhNt8W7xktxGg_DJHmPAXTEI6opbQy7PJ08Pnz02zR9fs6HX1FfwFvRoqLxYm0AhBAMpdKi0ARMnPnEtSbWQSE8r46sUAl-BkCLBNDSkXxiSaIU0pDjG4QnrlKsSTxkncoh0mFXhR5EUmMpYR0ZQ_1ITu_GMdS0v5muXQmPesOH8b_IFO3CmWxtXeMk61ccGr0j2V-q6XvRv2qeo1A | 
    
| linkProvider | IEEE | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELZKGWAC1CLeeGAkJU5sp2FDEaFAU1WolbpVfkVURWkF6cKv5xynBSEGtuSkWM7p7Ht-dwhd2a5_voEDGETgolCquCdYLr2IaZGDFxQHxmZ0swHvjenThE0a6HqDhTHGVMVnpmMfq1y-XqiVDZWB8x6RgHe30DajlDKH1lpLT8Ajzmmdw3P3MI9Dzi2ai9tbGHZD6iZP63e-KYWPb9IkyVxLTztI98eolUrTpHsoW-_RFZjMO6tSdtTnr_aN__2JfdT-xvTh4UZbHaCGKVpokLwaNV8uZvDFC-gHkCMsCo0druUWA232ZhGT2NaD4OE3ygC7cRCwFLahXJwOH-4-2mic3o-SnldPWPBmYDaUHlM6kIYHXUFUKJXmmkU-cC2KFREGjCvty5gHvuREkMjEoQYPQ4NOy4UG15GFh6hZLApzhDCQQwPHWeY-pYKYWDBFNYH1hQJ2m2PUsryYLl0TjWnNhpO_yZdopzfK-tP-4-D5FO26QK6tMjxDzfJ9Zc7BEijlRSUAX-hsrCE | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2011+IEEE+19th+Annual+International+Symposium+on+Field-Programmable+Custom+Computing+Machines&rft.atitle=Checkpoint%2FRestart+and+Beyond%3A+Resilient+High+Performance+Computing+with+FPGAs&rft.au=Schmidt%2C+A+G&rft.au=Bin+Huang&rft.au=Sass%2C+R&rft.au=French%2C+M&rft.date=2011-05-01&rft.pub=IEEE&rft.isbn=9781612842776&rft.spage=162&rft.epage=169&rft_id=info:doi/10.1109%2FFCCM.2011.22&rft.externalDocID=5771268 | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781612842776/lc.gif&client=summon&freeimage=true | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781612842776/mc.gif&client=summon&freeimage=true | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781612842776/sc.gif&client=summon&freeimage=true |