18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Workshop 3
A High-Performance and Energy-Efficient Architecture for Floating-Point Based LU Decomposition on FPGAs
Santa Fe, New Mexico
April 26-April 30
ISBN: 0-7695-2132-0
In this paper, we first develop a novel architecture for fixed-point LU decomposition of streaming input matrices, on FPGAs. Our architecture, based on a circular linear array, achieves the minimal latency and is resource-efficient. We then extend it, by using a stacked matrices approach, to a floating-point based architecture which achieves the minimal effective latency. Our design objective was to develop high-throughput and energy-efficient architectures for applications which require computing LU decomposition. We analyze (1) the impact of high-throughput, pipelined floating-point units (with different depths of pipelining and different performance) on the architecture?s performance, and (2) the impact of algorithm level design on the system-wide energy dissipation. We analyze the energy dissipation by capturing algorithm and architectural details of the target FPGA device. We analyze and compare our architecture with a state-of-art architecture implemented on FPGAs with respect to latency, area and energy. Our designs achieve a 10%-60% reduction in energy over that of the state-of-art architecture.
Citation:
Gokul Govindu, Seonil Choi, Viktor Prasanna, Vikash Daga, Sridhar Gangadharpalli, V. Sridhar, "A High-Performance and Energy-Efficient Architecture for Floating-Point Based LU Decomposition on FPGAs," ipdps, vol. 4, pp.149a, 18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Workshop 3, 2004