Accelerating a Sparse Matrix Iterative Solver Using a High Performance Reconfigurable Computer
High performance reconfigurable computers (HPRCs), which combine general-purpose processors (GPPs) and field programmable gate arrays (FPGAs), are now commercially available. These interesting architectures allow for the creation of reconfigurable processors. HPRCs have already been used to accelera...
Saved in:
Published in | 2010 DoD High Performance Computing Modernization Program Users Group Conference pp. 517 - 523 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.06.2010
|
Subjects | |
Online Access | Get full text |
ISBN | 9781612849867 1612849865 |
DOI | 10.1109/HPCMP-UGC.2010.30 |
Cover
Summary: | High performance reconfigurable computers (HPRCs), which combine general-purpose processors (GPPs) and field programmable gate arrays (FPGAs), are now commercially available. These interesting architectures allow for the creation of reconfigurable processors. HPRCs have already been used to accelerate integer and fixed-point applications. However, extensive parallelism and deeply pipelined floating-point cores are necessary to make MHz-scale FPGAs competitive with GHz-scale GPPs, thus making it difficult to accelerate certain kinds of floating-point kernels. Kernels with variable length nested loops, e.g., sparse matrix-vector multiply, have been problematic because of the loop-carried dependence associated with the pipelined floating-point units. While hardware description language (HDL)-based kernels have shown moderate success in addressing this problem, the use of a high-level language (HLL)-based approach to accelerate such applications has been rather elusive. If HPRCs are to become a part of mainstream military and scientific computing, we should emphasize the use of HLL-based programming, whenever possible, rather than HDL-based hardware design. The primary reason is the increased programmer productivity associated with HLLs when compared with HDLs. For example, the floating-point addition statement z = x+y, a single line in an HLL, corresponds to hundreds of lines of HDL. In this paper, we describe the design and implementation of a sparse matrix Jacobi processor to solve systems of linear equations, Ax=b. The parallelized, deeply pipelined, IEEE-754-compliant 32-bit floating-point sparse matrix Jacobi iterative solver runs on a contemporary HPRC. The FPGA-based components are implemented using only an HLL (the C programming language) and the Carte HLL-to-HDL compiler. An HLL-based streaming accumulator allows for the implementation of fully pipelined loops and results in a 2.5-fold wall clock runtime speedup when compared with an equivalent software-only implementation. |
---|---|
ISBN: | 9781612849867 1612849865 |
DOI: | 10.1109/HPCMP-UGC.2010.30 |