Blocking LU Decomposition for FPGAs

To efficiently perform large matrix LU decomposition on FPGAs with limited local memory, the original algorithm needs to be blocked. In this paper, we propose a block LU decomposition algorithm for FPGAs, which is applicable for matrices of arbitrary size. We introduce a high performance hardware de...

Full description

Saved in:
Bibliographic Details
Published in2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines pp. 109 - 112
Main Authors Guiming Wu, Yong Dou, Peterson, Gregory D
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2010
Subjects
Online AccessGet full text
ISBN9781424471423
0769540562
9780769540566
1424471427
DOI10.1109/FCCM.2010.25

Cover

More Information
Summary:To efficiently perform large matrix LU decomposition on FPGAs with limited local memory, the original algorithm needs to be blocked. In this paper, we propose a block LU decomposition algorithm for FPGAs, which is applicable for matrices of arbitrary size. We introduce a high performance hardware design, which mainly consists of a linear array of processing elements (PEs), to implement our block LU decomposition algorithm. A total of 36 PEs can be integrated into a Xilinx Virtex-5 xc5vlx330 FPGA on our self-designed PCI-Express card, reaching a sustained performance of 8.50 GFLOPS at 133 MHz, which outperforms previous work.
ISBN:9781424471423
0769540562
9780769540566
1424471427
DOI:10.1109/FCCM.2010.25