The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.10 - October (2007 vol.18)
pp: 1377-1392
ABSTRACT
Field programmable gate arrays (FPGAs) have become an attractive option for accelerating scientific applications. Many scientific operations such as matrix-vector multiplication and dot product involve the reduction of a sequentially produced stream of values. Unfortunately, because of the pipelining in FPGA-based floating-point units, data hazards may occur during these sequential reduction operations. Improperly designed reduction circuits can adversely impact the performance, impose unrealistic buffer requirements, and consume a significant portion of the FPGA. In this paper, we identify two basic methods for designing serial reduction circuits, the tree-traversal method and the striding method. Using accumulation as an example, we analyze the design tradeoffs between the number of adders, buffer size and latency, and propose high-performance and area-efficient designs using each method. The proposed designs reduce multiple sets of sequentially delivered floating-point values without stalling the pipeline or imposing unrealistic buffer requirements. Using a Xilinx Virtex-II Pro FPGA as the target device, we implemented our designs and present performance and area results.
INDEX TERMS
G.1.0.g Parallel algorithms, C.3.e Reconfigurable hardware
CITATION
Ling Zhuo, Gerald R. Morris, Viktor K. Prasanna, "High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs", IEEE Transactions on Parallel & Distributed Systems, vol.18, no. 10, pp. 1377-1392, October 2007, doi:10.1109/TPDS.2007.1068
33 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool