Accelerating a Sparse Matrix Iterative Solver Using a High Performance Reconfigurable Computer

High performance reconfigurable computers (HPRCs), which combine general-purpose processors (GPPs) and field programmable gate arrays (FPGAs), are now commercially available. These interesting architectures allow for the creation of reconfigurable processors. HPRCs have already been used to accelera...

Full description

Saved in:

Bibliographic Details
Published in	2010 DoD High Performance Computing Modernization Program Users Group Conference pp. 517 - 523
Main Authors	Morris, G. R., McGruder, R. Y., Abed, K. H.
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2010
Subjects	Computers Field programmable gate arrays FPGA Hardware iterative solver Jacobian matrices Kernel Program processors reconfigurable computer Sparse matrices sparse matrix
Online Access	Get full text
ISBN	9781612849867 1612849865
DOI	10.1109/HPCMP-UGC.2010.30

Cover

More Information
Summary:	High performance reconfigurable computers (HPRCs), which combine general-purpose processors (GPPs) and field programmable gate arrays (FPGAs), are now commercially available. These interesting architectures allow for the creation of reconfigurable processors. HPRCs have already been used to accelerate integer and fixed-point applications. However, extensive parallelism and deeply pipelined floating-point cores are necessary to make MHz-scale FPGAs competitive with GHz-scale GPPs, thus making it difficult to accelerate certain kinds of floating-point kernels. Kernels with variable length nested loops, e.g., sparse matrix-vector multiply, have been problematic because of the loop-carried dependence associated with the pipelined floating-point units. While hardware description language (HDL)-based kernels have shown moderate success in addressing this problem, the use of a high-level language (HLL)-based approach to accelerate such applications has been rather elusive. If HPRCs are to become a part of mainstream military and scientific computing, we should emphasize the use of HLL-based programming, whenever possible, rather than HDL-based hardware design. The primary reason is the increased programmer productivity associated with HLLs when compared with HDLs. For example, the floating-point addition statement z = x+y, a single line in an HLL, corresponds to hundreds of lines of HDL. In this paper, we describe the design and implementation of a sparse matrix Jacobi processor to solve systems of linear equations, Ax=b. The parallelized, deeply pipelined, IEEE-754-compliant 32-bit floating-point sparse matrix Jacobi iterative solver runs on a contemporary HPRC. The FPGA-based components are implemented using only an HLL (the C programming language) and the Carte HLL-to-HDL compiler. An HLL-based streaming accumulator allows for the implementation of fully pipelined loops and results in a 2.5-fold wall clock runtime speedup when compared with an equivalent software-only implementation.
ISBN:	9781612849867 1612849865
DOI:	10.1109/HPCMP-UGC.2010.30