The Community for Technology Leaders
RSS Icon
Issue No.04 - April (2010 vol.59)
pp: 433-448
Ben Cope , Imperial College London , London
Peter Y.K. Cheung , Imperial College London, London
Wayne Luk , Imperial College London, London
Lee Howes , Imperial College London
A systematic approach to the comparison of the graphics processor (GPU) and reconfigurable logic is defined in terms of three throughput drivers. The approach is applied to five case study algorithms, characterized by their arithmetic complexity, memory access requirements, and data dependence, and two target devices: the nVidia GeForce 7900 GTX GPU and a Xilinx Virtex-4 field programmable gate array (FPGA). Two orders of magnitude speedup, over a general-purpose processor, is observed for each device for arithmetic intensive algorithms. An FPGA is superior, over a GPU, for algorithms requiring large numbers of regular memory accesses, while the GPU is superior for algorithms with variable data reuse. In the presence of data dependence, the implementation of a customized data path in an FPGA exceeds GPU performance by up to eight times. The trends of the analysis to newer and future technologies are analyzed.
Graphics processors, reconfigurable hardware, real-time and embedded systems, signal processing systems, performance measures, video.
Ben Cope, Peter Y.K. Cheung, Wayne Luk, Lee Howes, "Performance Comparison of Graphics Processors to Reconfigurable Logic: A Case Study", IEEE Transactions on Computers, vol.59, no. 4, pp. 433-448, April 2010, doi:10.1109/TC.2009.179
[1] B. Cope, P.Y.K. Cheung, W. Luk, and S. Witt, "Have GPUs Made FPGAs Redundant in the Field of Video Processing?" Proc. IEEE Int'l Conf. Field-Programmable Technology, pp. 111-118, Dec. 2005.
[2] B. Cope, P.Y.K. Cheung, and W. Luk, "Bridging the Gap between FPGAs and Multi-Processor Architectures: A Video Processing Perspective," Proc. Application-Specific Systems, Architectures and Processors, pp. 308-313, July 2007.
[3] S. Che, J. Li, J.W. Sheaffer, K. Skadron, and J. Lach, "Accelerating Compute Intensive Applications with GPUs and FPGAs," Proc. Symp. Application Specific Processors, pp. 101-107, 2008.
[4] N.P. Sedcole, "Reconfigurable Platform-Based Design in FPGAs for Video Image Processing," PhD dissertation, Imperial College, Univ. of London, Jan. 2006.
[5] Z. Guo, W. Najjar, F. Vahid, and K. Vissers, "A Quantitative Analysis of the Speedup Factors of FPGAs over Processors," Proc. 2004 ACM/SIGDA 12th Int'l Symp. Field Programmable Gate Arrays (FPGA), pp. 162-170, Feb. 2004.
[6] X. Xue, A. Cheryauka, and D. Tubbs, "Acceleration of Fluoro-CT Reconstruction for a Mobile C-Arm on GPU and FPGA Hardware: A Simulation Study," Proc. SPIE, Medical Imaging 2006: Physics of Medical Imaging, pp. 1494-1501, 2006.
[7] K. Mueller, F. Xu, and N. Neophytou, "Why do Commodity Graphics Boards (GPUs) Work so Well for Acceleration of Computed Tomography?" Proc. SPIE Electronic Imaging (Keynote), 2007.
[8] L. Howes, O. Beckmann, O. Mencer, O. Pell, and P. Price, "Comparing FPGAs to Graphics Accelerators and the Playstation 2 Using a Unified Source Description," Proc. Int'l Conf. Field-Programmable Logic, pp. 119-124, Aug. 2006.
[9] Z.K. Baker, M.B. Gokhale, and J.L. Tripp, "Matched Filter Computation on FPGA, Cell and GPU," Proc. Int'l Symp. Field-Programmable Custom Computing Machines, pp. 207-216, Apr. 2007.
[10] G. Morris and M. Aubury, "Design Space Exploration of the European Option Benchmark Using Hyperstreams," Proc. Field-Programmable Logic, pp. 5-10, Aug. 2007.
[11] E. Kelmelis, J. Humphrey, J. Durbano, and F. Ortiz, "High-Performance Computing with Desktop Workstations," WSEAS Trans. Math., Jan. 2007.
[12] J.D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A.E. Lefohn, and T.J. Purcell, "A Survey of General-Purpose Computation on Graphics Hardware," Proc. Eurographics 2005, State of the Art Reports, pp. 21-51, Aug. 2005.
[13] J.D. Owens, M. Houston, D. Luebke, S. Green, J.E. Stone, and J.C. Phillips, "GPU Computing," Proc. IEEE, vol. 96, no. 5, pp. 879-899, May 2008.
[14] N.K. Govindaraju, S. Larsen, J. Gray, and D. Manocha, "A Memory Model for Scientific Algorithms on Graphics Processors," Proc. ACM/IEEE Super Computing, pp. 89-98, 2006.
[15] Sundance "Vender Published Specification of IP Core," , 2007.
[16] M.E. Angelopoulou, K. Masselos, P.Y.K. Cheung, and Y. Andreopoulos, "Implementation and Comparison of the 5/3 Lifting 2D Discrete Wavelet Transform Computation Schedules on FPGAs," J. VLSI Signal Processing, pp. 3-21, 2007.
[17] T.T. Wong, C.S. Leung, P.A. Heng, and J. Wang, "Discrete Wavelet Transform on Consumer-Level Graphics Hardware," IEEE Trans. Multimedia, vol. 9, no. 3, pp. 668-673, Apr. 2007.
[18] Q. Jin, D. Thomas, W. Luk, and B. Cope, "Exploring Reconfigurable Architectures for Binomial-Tree Pricing Models," ACM Trans. Reconfigurable Technology and Systems, vol. 4943, pp. 245-255, 2008.
[19] Nvidia, "Published Technology Size and Release Dates on Company Web Page," http:/, 2007.
[20] M. Pharr and R. Fernando, GPU Gems 2. Addison Wesley, 2005.
[21] R.C. Gonzalez and R.E. Woods, Digital Image Processing, second ed. Prentice Hall, 2002.
[22] Y.Q. Shi and H. Sun, Image and Video Compression for Multimedia Engineering: Fundamentals, Algorithms and Standards. CRC Press, 1999.
[23] Sony Broadcast and Professional Europe, "Sample High Definition Video Files," Provided in 720p and 1080i Frame Formats, 2005.
[24] D. Manchoa, "General Purpose Computations Using Graphics Processors," Entertainment Computing, vol. 38, no. 8, pp. 85-87, 2005.
[25] C.-S. Bouganis, G.A. Constantinides, and P.Y.K. Cheung, "A Novel 2D Filter Design Methodology for Heterogeneous Devices," Proc. 13th Ann. IEEE Symp. Field-Programmable Custom Computing Machines, pp. 13-22, Apr. 2005.
[26] R.P. Tidwell, "Alpha Blending Two Data Streams Using a DSP48 DDR Technique," Xilinx Application Note: XAPP706 (v1.0), 2005.
[27] S. Perry, "Design Patterns for Programmable Computing Platforms," Altera: Technical Presentation, 2005.
[28] V. Podlozhnyuk, "Image Convolution with Cuda," cuda/sdk/website/projects/convolutionSeparable/ docconvolutionSeparable.pdf , 2008.
[29] B. Cope, P.Y.K. Cheung, and W. Luk, "Using Reconfigurable Logic to Optimise GPU Memory Accesses," Proc. ACM/SIGDA Design, Automation and Test in Europe, pp. 44-49, Mar. 2008.
[30] Q. Liu, K. Masselos, and G.A. Constantinides, "Data Reuse Exploration for FPGA Based Platforms Applied to the Full Search Motion Estimation Algorithm," Proc. 16th IEEE Int'l Conf. Field Programmable Logic and Applications, Aug. 2006.
[31] V. Podlozhnyuk, "Histogram Calculation in Cuda," cuda/sdk/website/projects/histogram256/ dochistogram.pdf, 2008.
[32] N. Campregher, P.Y.K. Cheung, G.A. Constantinides, and M. Vasilko, "Analysis of Yield Loss Due to Random Photolithographic Defects in the Interconnect Structure of FPGAs," Proc. 2005 ACM/SIGDA 13th Int'l Symp. Field-Programmable Gate Array, pp. 138-148, Feb. 2005.
[33] ITRS Working Group, "ITRS Report on System Drivers," http://www.itrs.netreports.html, 2005-2006.
[34] G.W. Morris, G.A. Constantinides, and P.Y.K. Cheung, "Migrating Functionality from ROMs to Embedded Multipliers," Proc. 12th IEEE Symp. Field-Programmable Custom Computing Machines, pp. 287-288, Apr. 2004.
[35] I. Kuon and J. Rose, "Measuring the Gap between FPGAs and ASICs," Proc. 2006 ACM/SIGDA Int'l Symp. Field-Programmable Gate Arrays, pp. 21-30, Feb. 2006.
[36] M. Harris, "Information on the GeForce 7800 GTX Graphics Processor," Personal Correspondence with NVIDA Expert, 2006.
[37] G.M. Amdahl, "Validity of the Single-Processor Approach to Achieving Large Scale Computing Capabilities," Proc. Am. Federation of Information Processing Soc. (AFIPS) Conf., vol. 30, pp. 483-485, 1967.
19 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool