From the March 2012 Issue
A High Performance and Memory Efficient LU Decomposer on FPGAs
By Guiming Wu, Yong Dou, Junqing Sun, & Gregory D. Peterson
LU decomposition for dense matrices is an important linear algebra kernel that is widely used in both scientific and engineering applications. To efficiently perform large matrix LU decomposition on FPGAs with limited local memory, a block LU decomposition algorithm on FPGAs applicable to arbitrary matrix size is proposed. Our algorithm applies a series of transformations, including loop blocking and space-time mapping, onto sequential nonblocking LU decomposition. We also introduce a high performance and memory efficient hardware architecture, which mainly consists of a linear array of processing elements (PEs), to implement our block LU decomposition algorithm. Our design can achieve optimum performance under various hardware resource constraints.
View the PDF of this article
View this issue in the digital library