A Sparse Matrix Personality for the Convey HC-1

In this paper we describe a double precision floating point sparse matrix-vector multiplier (SpMV) and its performance as implemented on a Convey HC-1 reconfigurable computer. The primary contributions of this work are a novel streaming reduction architecture for floating point accumulation, a novel...

Full description

Saved in:

Bibliographic Details
Published in	2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines pp. 1 - 8
Main Authors	Nagar, K K, Bakos, J D
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2011
Subjects	Adders Arrays Coprocessors Field programmable gate arrays floating point accumulation Pipelines reconfigurable computing reduction Sparse matrices sparse matrix SpMV
Online Access	Get full text
ISBN	9781612842776 1612842771
DOI	10.1109/FCCM.2011.60

Cover

More Information
Summary:	In this paper we describe a double precision floating point sparse matrix-vector multiplier (SpMV) and its performance as implemented on a Convey HC-1 reconfigurable computer. The primary contributions of this work are a novel streaming reduction architecture for floating point accumulation, a novel on-chip cache optimized for streaming compressed sparse row (CSR) matrices, and end-to-end integration with the HC-1's system, programming model, and runtime environment. The design is composed of 32 parallel processing elements, each connected to the HC-1's coprocessor memory and each containing a streaming multiply-accumulator and local vector cache. When used on the HC-1, each PE has a peak throughput of 300 double precision MFLOP/s, giving a total peak throughput of 9.6 GFLOPS/s. For our test matrices, we demonstrate up to 40% of the peak performance and compare these results with results obtained using the CUSparse library on an NVIDIA Tesla S1070 GPU. In most cases our implementation exceeds the performance of the GPU.
ISBN:	9781612842776 1612842771
DOI:	10.1109/FCCM.2011.60