Accelerating a Sparse Matrix Iterative Solver Using a High Performance Reconfigurable Computer

High performance reconfigurable computers (HPRCs), which combine general-purpose processors (GPPs) and field programmable gate arrays (FPGAs), are now commercially available. These interesting architectures allow for the creation of reconfigurable processors. HPRCs have already been used to accelera...

Full description

Saved in:
Bibliographic Details
Published in2010 DoD High Performance Computing Modernization Program Users Group Conference pp. 517 - 523
Main Authors Morris, G. R., McGruder, R. Y., Abed, K. H.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2010
Subjects
Online AccessGet full text
ISBN9781612849867
1612849865
DOI10.1109/HPCMP-UGC.2010.30

Cover

Abstract High performance reconfigurable computers (HPRCs), which combine general-purpose processors (GPPs) and field programmable gate arrays (FPGAs), are now commercially available. These interesting architectures allow for the creation of reconfigurable processors. HPRCs have already been used to accelerate integer and fixed-point applications. However, extensive parallelism and deeply pipelined floating-point cores are necessary to make MHz-scale FPGAs competitive with GHz-scale GPPs, thus making it difficult to accelerate certain kinds of floating-point kernels. Kernels with variable length nested loops, e.g., sparse matrix-vector multiply, have been problematic because of the loop-carried dependence associated with the pipelined floating-point units. While hardware description language (HDL)-based kernels have shown moderate success in addressing this problem, the use of a high-level language (HLL)-based approach to accelerate such applications has been rather elusive. If HPRCs are to become a part of mainstream military and scientific computing, we should emphasize the use of HLL-based programming, whenever possible, rather than HDL-based hardware design. The primary reason is the increased programmer productivity associated with HLLs when compared with HDLs. For example, the floating-point addition statement z = x+y, a single line in an HLL, corresponds to hundreds of lines of HDL. In this paper, we describe the design and implementation of a sparse matrix Jacobi processor to solve systems of linear equations, Ax=b. The parallelized, deeply pipelined, IEEE-754-compliant 32-bit floating-point sparse matrix Jacobi iterative solver runs on a contemporary HPRC. The FPGA-based components are implemented using only an HLL (the C programming language) and the Carte HLL-to-HDL compiler. An HLL-based streaming accumulator allows for the implementation of fully pipelined loops and results in a 2.5-fold wall clock runtime speedup when compared with an equivalent software-only implementation.
AbstractList High performance reconfigurable computers (HPRCs), which combine general-purpose processors (GPPs) and field programmable gate arrays (FPGAs), are now commercially available. These interesting architectures allow for the creation of reconfigurable processors. HPRCs have already been used to accelerate integer and fixed-point applications. However, extensive parallelism and deeply pipelined floating-point cores are necessary to make MHz-scale FPGAs competitive with GHz-scale GPPs, thus making it difficult to accelerate certain kinds of floating-point kernels. Kernels with variable length nested loops, e.g., sparse matrix-vector multiply, have been problematic because of the loop-carried dependence associated with the pipelined floating-point units. While hardware description language (HDL)-based kernels have shown moderate success in addressing this problem, the use of a high-level language (HLL)-based approach to accelerate such applications has been rather elusive. If HPRCs are to become a part of mainstream military and scientific computing, we should emphasize the use of HLL-based programming, whenever possible, rather than HDL-based hardware design. The primary reason is the increased programmer productivity associated with HLLs when compared with HDLs. For example, the floating-point addition statement z = x+y, a single line in an HLL, corresponds to hundreds of lines of HDL. In this paper, we describe the design and implementation of a sparse matrix Jacobi processor to solve systems of linear equations, Ax=b. The parallelized, deeply pipelined, IEEE-754-compliant 32-bit floating-point sparse matrix Jacobi iterative solver runs on a contemporary HPRC. The FPGA-based components are implemented using only an HLL (the C programming language) and the Carte HLL-to-HDL compiler. An HLL-based streaming accumulator allows for the implementation of fully pipelined loops and results in a 2.5-fold wall clock runtime speedup when compared with an equivalent software-only implementation.
Author McGruder, R. Y.
Abed, K. H.
Morris, G. R.
Author_xml – sequence: 1
  givenname: G. R.
  surname: Morris
  fullname: Morris, G. R.
  email: gerald.r.morris@us.army.mil
  organization: Eng. R&D Center, DoD Supercomput. Resource Center (ERDC DSRC), US Army, Vicksburg, MS, USA
– sequence: 2
  givenname: R. Y.
  surname: McGruder
  fullname: McGruder, R. Y.
  email: ricky.y.mcgruder@studentsjsums.edu
  organization: Dept. of Comput. Eng., Jackson State Univ., Jackson, MS, USA
– sequence: 3
  givenname: K. H.
  surname: Abed
  fullname: Abed, K. H.
  email: khalid.h.abed@jsums.edu
  organization: Dept. of Comput. Eng., Jackson State Univ., Jackson, MS, USA
BookMark eNotTNtOwjAYbqIkCu4BjDd9gWG7lh4uyaKMBOIicCvpur-zZgfSDaJv7yJ-N1--4xTdtl0LCD1SMqeU6OcsT7d5fFil84SMHiM3KNJSUUETxbUScoKmY0J1kmgu71DU919khCCKKXmPPpbWQg3BDL6tsMG7kwk94K0Zgv_G6-EvuQDedfUFAj7011rmq0-cQ3BdaExrAb-D7Vrnq3MwRQ047ZrTeRw_oIkzdQ_RP8_Q_vVln2bx5m21Tpeb2GsyxGVBmC6gFBpYWXCnDZMOTGGhkOWCcCI5cKm1UeUorKbWLhxIKpxSiouEzdDT9dYDwPEUfGPCz1EQqghj7Bfu2VfK
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/HPCMP-UGC.2010.30
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 523
ExternalDocumentID 6018033
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ADFMO
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
IEGSK
IERZE
OCL
RIE
RIL
ID FETCH-LOGICAL-i90t-db039bed69e3db4f9a37feabceb7d504074e4799a8d040c91cc5fe716f8884623
IEDL.DBID RIE
ISBN 9781612849867
1612849865
IngestDate Wed Aug 27 02:37:10 EDT 2025
IsPeerReviewed false
IsScholarly false
LCCN 2011922947
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i90t-db039bed69e3db4f9a37feabceb7d504074e4799a8d040c91cc5fe716f8884623
PageCount 7
ParticipantIDs ieee_primary_6018033
PublicationCentury 2000
PublicationDate 2010-June
PublicationDateYYYYMMDD 2010-06-01
PublicationDate_xml – month: 06
  year: 2010
  text: 2010-June
PublicationDecade 2010
PublicationTitle 2010 DoD High Performance Computing Modernization Program Users Group Conference
PublicationTitleAbbrev hpcmp-ugc
PublicationYear 2010
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000608387
Score 1.467264
Snippet High performance reconfigurable computers (HPRCs), which combine general-purpose processors (GPPs) and field programmable gate arrays (FPGAs), are now...
SourceID ieee
SourceType Publisher
StartPage 517
SubjectTerms Computers
Field programmable gate arrays
FPGA
Hardware
iterative solver
Jacobian matrices
Kernel
Program processors
reconfigurable computer
Sparse matrices
sparse matrix
Title Accelerating a Sparse Matrix Iterative Solver Using a High Performance Reconfigurable Computer
URI https://ieeexplore.ieee.org/document/6018033
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3Na8IwFA_qaSe36dg3Oey4aG3TpO84ZM4NOgoqeJo0XyKDKtLC2F-_pB8KY4fdmlJoCEne7733e7-H0APwUHpcUGKdZZ9QUECED4ZIixVMIA1LtYt3xO9suqBvy3DZQo-HWhitdUk-0wP3WOby1VYWLlQ2ZE5tKgjaqM0jVtVqHeIpHrNgInIt-CyGsXcuRCysJZ2aMa-zmiMPhtNkHCdk8TKu2F0lCfrYXaU0LpMuiptpVZySz0GRi4H8_qXY-N95n6L-sYwPJwcDdYZaOjtH3aaPA66PdQ99PElprY_bC9kap3i2s96uxrFT7__Cr6Xusr0U8WzraNS4ZBnYzxxFBCfHwgPsXNnMbNbF3tVj4eZHfTSfPM_HU1L3XSAb8HKihBeA0IqBDpSgBtKAG50KqQVXoT30nGrKAdJI2YGEkXSMNet3GetNUwunLlAn22b6EuGAcfBlCJEJDRUmAgcnVGiE5Knna3GFem61VrtKWWNVL9T1369v0EmVu3cxkFvUyfeFvrOQIBf35V74AU7Os7M
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA9zHvQ0dRO_zcGj2WqbNM1RhrPTdRS2wU6O5msMoRujBfGvN-nHBuLBW1MKDSHJ-733fu_3AHhglAiHcoyMs-wizCRD3GUaCYMVtCe0nygb74jGfjjDb3Myb4DHXS2MUqogn6mufSxy-XItchsq6_lWbcrzDsAhwRiTslprF1FxfAMnAtuEz6AYc-uywCeVqFM9plVe88lhvTDuRzGavfZLfldBg973VynMy6AFonpiJavks5tnvCu-f2k2_nfmJ6CzL-SD8c5EnYKGSs9Aq-7kAKuD3QYfz0IY-2N3Q7qECZxsjL-rYGT1-7_gsFBeNtcinKwtkRoWPAPzmSWJwHhfegCtM5vq1TLf2oosWP-oA6aDl2k_RFXnBbRiToYkdzzGlfSZ8iTHmiUe1SrhQnEqiTn2FCtMGUsCaQaCPQnLWTOelzb-NDaA6hw003WqLgD0fMpcQVigicZcB8wCCkk0FzRxXMUvQduu1mJTamssqoW6-vv1PTgKp9FoMRqO36_BcZnJtxGRG9DMtrm6NQAh43fFvvgB5-q3AA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2010+DoD+High+Performance+Computing+Modernization+Program+Users+Group+Conference&rft.atitle=Accelerating+a+Sparse+Matrix+Iterative+Solver+Using+a+High+Performance+Reconfigurable+Computer&rft.au=Morris%2C+G.+R.&rft.au=McGruder%2C+R.+Y.&rft.au=Abed%2C+K.+H.&rft.date=2010-06-01&rft.pub=IEEE&rft.isbn=9781612849867&rft.spage=517&rft.epage=523&rft_id=info:doi/10.1109%2FHPCMP-UGC.2010.30&rft.externalDocID=6018033
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781612849867/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781612849867/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781612849867/sc.gif&client=summon&freeimage=true