Blocking LU Decomposition for FPGAs
To efficiently perform large matrix LU decomposition on FPGAs with limited local memory, the original algorithm needs to be blocked. In this paper, we propose a block LU decomposition algorithm for FPGAs, which is applicable for matrices of arbitrary size. We introduce a high performance hardware de...
Saved in:
| Published in | 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines pp. 109 - 112 |
|---|---|
| Main Authors | , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
01.05.2010
|
| Subjects | |
| Online Access | Get full text |
| ISBN | 9781424471423 0769540562 9780769540566 1424471427 |
| DOI | 10.1109/FCCM.2010.25 |
Cover
| Summary: | To efficiently perform large matrix LU decomposition on FPGAs with limited local memory, the original algorithm needs to be blocked. In this paper, we propose a block LU decomposition algorithm for FPGAs, which is applicable for matrices of arbitrary size. We introduce a high performance hardware design, which mainly consists of a linear array of processing elements (PEs), to implement our block LU decomposition algorithm. A total of 36 PEs can be integrated into a Xilinx Virtex-5 xc5vlx330 FPGA on our self-designed PCI-Express card, reaching a sustained performance of 8.50 GFLOPS at 133 MHz, which outperforms previous work. |
|---|---|
| ISBN: | 9781424471423 0769540562 9780769540566 1424471427 |
| DOI: | 10.1109/FCCM.2010.25 |