Blocking LU Decomposition for FPGAs
To efficiently perform large matrix LU decomposition on FPGAs with limited local memory, the original algorithm needs to be blocked. In this paper, we propose a block LU decomposition algorithm for FPGAs, which is applicable for matrices of arbitrary size. We introduce a high performance hardware de...
        Saved in:
      
    
          | Published in | 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines pp. 109 - 112 | 
|---|---|
| Main Authors | , , | 
| Format | Conference Proceeding | 
| Language | English | 
| Published | 
            IEEE
    
        01.05.2010
     | 
| Subjects | |
| Online Access | Get full text | 
| ISBN | 9781424471423 0769540562 9780769540566 1424471427  | 
| DOI | 10.1109/FCCM.2010.25 | 
Cover
| Summary: | To efficiently perform large matrix LU decomposition on FPGAs with limited local memory, the original algorithm needs to be blocked. In this paper, we propose a block LU decomposition algorithm for FPGAs, which is applicable for matrices of arbitrary size. We introduce a high performance hardware design, which mainly consists of a linear array of processing elements (PEs), to implement our block LU decomposition algorithm. A total of 36 PEs can be integrated into a Xilinx Virtex-5 xc5vlx330 FPGA on our self-designed PCI-Express card, reaching a sustained performance of 8.50 GFLOPS at 133 MHz, which outperforms previous work. | 
|---|---|
| ISBN: | 9781424471423 0769540562 9780769540566 1424471427  | 
| DOI: | 10.1109/FCCM.2010.25 |