The Community for Technology Leaders
RSS Icon
Issue No.01 - January (2011 vol.22)
pp: 58-68
Rick Weber , University of Tennessee, Knoxville
Akila Gothandaraman , University of Pittsburgh, Pittsburgh
Robert J. Hinde , University of Tennessee, Knoxville
Gregory D. Peterson , University of Tennessee, Knoxville
Multicore processors and a variety of accelerators have allowed scientific applications to scale to larger problem sizes. We present a performance, design methodology, platform, and architectural comparison of several application accelerators executing a Quantum Monte Carlo application. We compare the application's performance and programmability on a variety of platforms including CUDA with Nvidia GPUs, Brook+ with ATI graphics accelerators, OpenCL running on both multicore and graphics processors, C++ running on multicore processors, and a VHDL implementation running on a Xilinx FPGA. We show that OpenCL provides application portability between multicore processors and GPUs, but may incur a performance cost. Furthermore, we illustrate that graphics accelerators can make simulations involving large numbers of particles feasible.
Accelerator, OpenCL, FPGA, GPU, multicore, CUDA, computational science.
Rick Weber, Akila Gothandaraman, Robert J. Hinde, Gregory D. Peterson, "Comparing Hardware Accelerators in Scientific Applications: A Case Study", IEEE Transactions on Parallel & Distributed Systems, vol.22, no. 1, pp. 58-68, January 2011, doi:10.1109/TPDS.2010.125
[1] ATI Radeon HD 5870 GPU Feature Summary, http://www.amd. com/us/products/desktop/ graphics/ati-radeon-hd-5000/ hd-5870/ Pagesati-radeon-hd-5870-specifications.aspx , 2010.
[2] S. Wasson, "Intel's Core i7 Processors: Nehalem Arrives with a Splash,", 2010.
[3] Nvidia, NVIDIA CUDA Programming Guide 2.3.1, Nvidia, 2009.
[4] Khro nos Group, The OpenCL Specification Version 1.0, A. Munshi, ed. Khro nos Group, 2009.
[5] A. Gothandaraman, G. Peterson, G. Warren, R. Hinde, and R. Harrison, "FPGA Acceleration of a Quantum Monte Carlo Application," Parallel Computing, vol. 34, nos. 4/5, pp. 278-291, , May 2008.
[6] J.M. Thijssen, Computational Physics. Cambridge Univ. Press, 1999.
[7] M. Mascagni and A. Srinivasan, "Algorithm 806: SPRNG: A Scalable Library for Pseudorandom Number Generation," ACM Trans. Math. Software, vol. 26, no. 3, pp. 436-461, 2000.
[8] W. Kahan, "Pracniques: Further Remarks on Reducing Truncation Errors," Comm. ACM, vol. 8, no. 1 p. 40, 1965.
[9] O.A.R. Rev. Board, OpenMP Application Programming Interface Version 3.0, OpenMP Architecture Rev. Board, 2008.
[10] Message Passing Interface, mpi/, 2010.
[11] NVIDIA GeForce 9400m Integrated Graphics Launched, http://laptoping.comnvidia-geforce-9400m.html , 2010.
[12] Firestream 9170: Industry's First GPU with Double-Precision Floating Point, specs.html, 2010.
[13] V. Volkov and J.W. Demmel, "Benchmarking GPUs to Tune Dense Linear Algebra," Proc. ACM/IEEE Conf. Supercomputing (SC '08), pp. 1-11, 2008.
[14] Y. Li, J. Dongarra, and S. Tomov, "A Note on Auto-tuning GEMM for GPUs," Proc. Ninth Int'l Conf. Computational Science (ICCS '09), pp. 884-892, 2009.
[15] J. Molemaker, J.M. Cohen, S. Patel, and J. Noh, "Low Viscosity Flow Simulations for Animation," Proc. ACM SIGGRAPH/Eurographics Symp. Computer Animation (SCA '08), pp. 9-18, 2008.
[16] Nvidia, CUDA Zone—The Resource for CUDA Developers, , 2009.
[17] A.G. Anderson, W.A. Goddard,III, and P. Schröder, "Quantum Monte Carlo on Graphical Processing Units," Computer Physics Comm., vol. 177, no. 3, pp. 298-306, 2007.
[18] Univ. of Stanford, BrookGPU, /, 2010.
[19] ATI, ATI Stream Computing User Guide. ATI, 2009.
[20] V. Demchik and A. Strelchenko, "Monte Carlo Simulations on Graphics Processing Units," http://www.citebase.orgabstract? , 2009.
[21] Apple—Mac OS X Snow Leopard—Technical Specifications,, 2009.
[22] First Details on Future Intel Design Codename "Larrabee," 2008 0804fact.htm, 2008.
[23] IBM, The Cell Project at IBM Research, http://www.research. ibm.comcell/, 2010.
[24] K. Compton and S. Hauck, "Reconfigurable Computing: A Survey of Systems and Software," ACM Computing Surveys, vol. 34, no. 2, pp. 171-210, 2002.
[25] J.W. Lockwood, J.S. Turner, and D.E. Taylor, "Field Programmable Port Extender (FPX) for Distributed Routing and Queuing," Proc. ACM/SIGDA Eighth Int'l Symp. Field Programmable Gate Arrays (FPGA '00), pp. 137-144, 2000.
[26] J. Sun, G.D. Peterson, and O.O. Storaasli, "High-Performance Mixed-Precision Linear Solver for FPGAs," IEEE Trans. Computers, vol. 57, no. 12, pp. 1614-1623, Dec. 2008.
[27] Compare Stratix III and Virtex-5 Core Power Consumption, , 2010.
[28] A. Gothandaraman, G.D. Peterson, G.L. Warren, R.J. Hinde, and R.J. Harrison, "A Hardware-Accelerated Quantum Monte Carlo Framework (HAQMC) for N-Body Systems," to be published in Computer Physics Comm., B6TJ5-4WNGWBF-3/20308fb43857a82bf5e41ec d30e19193a , 2009.
[29] J.-L. Brelet, Using Block RAM for High Performance Read/Write CAMs, application_notesxapp204.pdf , 2000.
[30] ATI, ATI Stream SDK v2.0 Performance and Optimization, assetsATI_ Stream_SDK_P erformance_Notes.pdf , 2009.
[31] D. Behr, AMD GPU Architecture OpenCL Tutorial, PPAM 2009, 09E1-OpenCL-Architecture.pdf, 2009.
16 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool