loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Workshop 3
A High-Performance and Energy-Efficient Architecture for Floating-Point Based LU Decomposition on FPGAs
Santa Fe, New Mexico
April 26-April 30
ISBN: 0-7695-2132-0
Gokul Govindu, University of Southeren California
Seonil Choi, University of Southeren California
Viktor Prasanna, University of Southeren California
Vikash Daga, Satyam Computer Services Ltd.
Sridhar Gangadharpalli, Satyam Computer Services Ltd.
V. Sridhar, Satyam Computer Services Ltd.
In this paper, we first develop a novel architecture for fixed-point LU decomposition of streaming input matrices, on FPGAs. Our architecture, based on a circular linear array, achieves the minimal latency and is resource-efficient. We then extend it, by using a stacked matrices approach, to a floating-point based architecture which achieves the minimal effective latency. Our design objective was to develop high-throughput and energy-efficient architectures for applications which require computing LU decomposition. We analyze (1) the impact of high-throughput, pipelined floating-point units (with different depths of pipelining and different performance) on the architecture?s performance, and (2) the impact of algorithm level design on the system-wide energy dissipation. We analyze the energy dissipation by capturing algorithm and architectural details of the target FPGA device. We analyze and compare our architecture with a state-of-art architecture implemented on FPGAs with respect to latency, area and energy. Our designs achieve a 10%-60% reduction in energy over that of the state-of-art architecture.
Citation:
Gokul Govindu, Seonil Choi, Viktor Prasanna, Vikash Daga, Sridhar Gangadharpalli, V. Sridhar, "A High-Performance and Energy-Efficient Architecture for Floating-Point Based LU Decomposition on FPGAs," ipdps, vol. 4, pp.149a, 18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Workshop 3, 2004
Usage of this product signifies your acceptance of the Terms of Use.