This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs
October 2007 (vol. 18 no. 10)
pp. 1377-1392
Field programmable gate arrays (FPGAs) have become an attractive option for accelerating scientific applications. Many scientific operations such as matrix-vector multiplication and dot product involve the reduction of a sequentially produced stream of values. Unfortunately, because of the pipelining in FPGA-based floating-point units, data hazards may occur during these sequential reduction operations. Improperly designed reduction circuits can adversely impact the performance, impose unrealistic buffer requirements, and consume a significant portion of the FPGA. In this paper, we identify two basic methods for designing serial reduction circuits, the tree-traversal method and the striding method. Using accumulation as an example, we analyze the design tradeoffs between the number of adders, buffer size and latency, and propose high-performance and area-efficient designs using each method. The proposed designs reduce multiple sets of sequentially delivered floating-point values without stalling the pipeline or imposing unrealistic buffer requirements. Using a Xilinx Virtex-II Pro FPGA as the target device, we implemented our designs and present performance and area results.

[1] C. Wolinski, F. Trouw, and M. Gokhale, “A Preliminary Study of Molecular Dynamics on Reconfigurable Computers,” Proc. Int'l Conf. Eng. Reconfigurable Systems and Algorithms (ERSA '03), June 2003.
[2] K.D. Underwood and K.S. Hemmert, “Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance,” Proc. 15th IEEE Symp. Field-Programmable Custom Computing Machines, Apr. 2004.
[3] M. Smith, J. Vetter, and S. Alam, “Scientific Computing Beyond CPUs: FPGA Implementations of Common Scientific Kernels,” Proc. Eighth Ann. Int'l Conf. Military and Aerospace Programmable Logic Devices (MAPLD '05), Sept. 2005.
[4] Y. Bi, G. Peterson, L. Warren, and R. Harrison, “Hardware Acceleration of Parallel Lagged-Fibonacci Pseudo Random Number Generation,” Proc. Int'l Conf. Eng. Reconfigurable Systems and Algorithms (ERSA '06), June 2006.
[5] P. Kancharla and D. Buell, “The Advanced Encryption Standard on the HC-36m Reconfigurable Computer,” Proc. Sixth Ann. Int'l Conf. Military and Aerospace Programmable Logic Devices (MAPLD '03), Sept. 2003.
[6] C. Shu, K. Gaj, and T.A. El-Ghazawi, “Low Latency Elliptic Curve Cryptography Accelerators for NIST Curves over Binary Fields,” Proc. IEEE Int'l Conf. Field-Programmable Technology (FPT '05), Dec. 2005.
[7] C. Conger, I. Troxel, D. Espinoza, V. Aggarwal, and A. George, “NARC: Network-Attached Reconfigurable Computing for High-Performance, Network-Based Applications,” Proc. Eighth Ann. Int'l Conf. Military and Aerospace Programmable Logic Devices (MAPLD '05), Sept. 2005.
[8] SRC Computers Inc., MAP Processor, http://www.srccomp.comHardwareElements.htm , 2006.
[9] Cray Inc., Cray XD1, http://www.cray.com/productsxd1, 2006.
[10] R. Scrofano and V.K. Prasanna, “Computing Lennard-Jones Potentials and Forces with Reconfigurable Hardware,” Proc. Int'l Conf. Eng. Reconfigurable Systems and Algorithms (ERSA '04), June 2004.
[11] C.-C. Lam, D. Cociorva, G. Baumgartner, and P. Sadayappan, “Memory-Optimal Evaluation of Expression Trees Involving Large Objects,” Proc. Sixth Int'l Conf. High Performance Computing (HiPC '99), Dec. 1999.
[12] D. Bader, S. Sreshta, and N. Weisse-Bernstein, “Evaluating Arithmetic Expressions Using Tree Contraction: A Fast and Scalable Parallel Implementation for Symmetric Multiprocessors (SMPs),” Proc. Int'l Conf. High Performance Computing (HiPC '02), Dec. 2002.
[13] P.M. Kogge, The Architecture of Pipelined Computers. Hemisphere Publishing Corp., 1981.
[14] L.M. Ni and K. Hwang, “Vector Reduction Methods for Arithmetic Pipelines,” Proc. Sixth Int'l Symp. Computer Arithmetic, June 1983.
[15] Xilinx Inc., http:/www.xilinx.com, 2006.
[16] K. Underwood, “FPGAs vs. CPUs: Trends in Peak Floating-Point Performance,” Proc. ACM/SIGDA 12th Int'l Symp. Field-Programmable Gate Arrays, Feb. 2004.
[17] SGI Inc., SGI RASC Technology, http://www.sgi.com/productsrasc, 2006.
[18] L. Zhuo and V.K. Prasanna, “Sparse Matrix-Vector Multiplication on FPGAs,” Proc. ACM/SIGDA 13th Int'l Symp. Field-Programmable Gate Arrays, Feb. 2005.
[19] G.R. Morris, R.D. Anderson, and V.K. Prasanna, “A Hybrid Approach for Mapping Conjugate Gradient onto an FPGA-Augmented Reconfigurable Supercomputer,” Proc. 14th IEEE Symp. Field-Programmable Custom Computing Machines, Apr. 2006.
[20] X. Wang, S. Braganza, and M. Leeser, “Advanced Components in the Variable Precision Floating-Point Library,” Proc. 14th IEEE Symp. Field-Programmable Custom Computing Machines, Apr. 2006.
[21] L. Zhuo, G.R. Morris, and V.K. Prasanna, “Designing Scalable FPGA-Based Reduction Circuits Using Pipelined Floating-Point Cores,” Proc. 12th Reconfigurable Architectures Workshop, Apr. 2005.
[22] G.R. Morris, R.D. Anderson, and V.K. Prasanna, “An FPGA-Based Application-Specific Processor for Efficient Reduction of Multiple Variable-Length Floating-Point Data Sets,” Proc. IEEE 17th Int'l Conf. Application-Specific Systems, Architectures, and Processors, Sept. 2006.
[23] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction to Algorithms, second ed. MIT Press, 2001.
[24] Mentor Graphics Corp., http:/www.mentor.com/, 2006.
[25] G. Govindu, R. Scrofano, and V.K. Prasanna, “A Library of Parameterizable Floating-Point Cores for FPGAs and Their Application to Scientific Computing,” Proc. Int'l Conf. Eng. Reconfigurable Systems and Algorithms (ERSA '05), June 2005.

Index Terms:
G.1.0.g Parallel algorithms, C.3.e Reconfigurable hardware
Citation:
Ling Zhuo, Gerald R. Morris, Viktor K. Prasanna, "High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs," IEEE Transactions on Parallel and Distributed Systems, vol. 18, no. 10, pp. 1377-1392, Oct. 2007, doi:10.1109/TPDS.2007.1068
Usage of this product signifies your acceptance of the Terms of Use.