The Community for Technology Leaders
2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (2010)
Charlotte, North Carolina, USA
May 2, 2010 to May 4, 2010
ISBN: 978-0-7695-4056-6
pp: 109-112
ABSTRACT
To efficiently perform large matrix LU decomposition on FPGAs with limited local memory, the original algorithm needs to be blocked. In this paper, we propose a block LU decomposition algorithm for FPGAs, which is applicable for matrices of arbitrary size. We introduce a high performance hardware design, which mainly consists of a linear array of processing elements (PEs), to implement our block LU decomposition algorithm. A total of 36 PEs can be integrated into a Xilinx Virtex-5 xc5vlx330 FPGA on our self-designed PCI-Express card, reaching a sustained performance of 8.50 GFLOPS at 133 MHz, which outperforms previous work.
INDEX TERMS
CITATION

Y. Dou, G. D. Peterson and G. Wu, "Blocking LU Decomposition for FPGAs," 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines(FCCM), Charlotte, North Carolina, USA, 2010, pp. 109-112.
doi:10.1109/FCCM.2010.25
95 ms
(Ver 3.3 (11022016))