Issue No.01 - January (2012 vol.23)
pp: 94-101
Lawrence Murray , CSIRO Mathematics, Wembley
We consider the use of commodity graphics processing units (GPUs) for the common task of numerically integrating ordinary differential equations (ODEs), achieving speedups of up to 115-fold over comparable serial CPU implementations, and 15-fold over multithreaded CPU code with SIMD intrinsics. Using Lorenz '96 models as a case study, single and double precision benchmarks are established for both the widely used DOPRI5 method and computationally tailored low-storage {\rm RK}4(3)5[2{\rm R}+]{\rm C}. A range of configurations are assessed on each, including multithreading and SIMD intrinsics on the CPU, and GPU kernels parallelized over both the dimensionality of the ODE system and number of trajectories. On the GPU, we draw particular attention to the problem of variable task-length among threads of the same warp, proposing a lightweight strategy of assigning multiple data items to each thread to reduce the prevalence of redundant operations. A simple analysis suggests that the strategy can draw performance close to that of ideal parallelism, while empirical results demonstrate up to a 10 percent improvement over the standard approach.
Ordinary differential equations, initial value problems, Runge-Kutta integration, graphics hardware, GPGPU.
Lawrence Murray, "GPU Acceleration of Runge-Kutta Integrators", IEEE Transactions on Parallel & Distributed Systems, vol.23, no. 1, pp. 94-101, January 2012, doi:10.1109/TPDS.2011.61
[1] J.D. Owens, M. Houston, D. Luebke, S. Green, J.E. Stone, and J.C. Phillips, “GPU Computing,” Proc. IEEE, vol. 96, no. 5, pp. 879-899, May 2008.
[2] W. Liu, B. Schmidt, G. Voss, and W. Muller-Wittig, “Streaming Algorithms for Biological Sequence Alignment on GPUs,” IEEE Trans. Parallel and Distributed Systems, vol. 18, no. 9, pp. 1270-1281, Sept. 2007.
[3] T. Preis, P. Virnau, W. Paul, and J.J. Schneider, “GPU Accelerated Monte Carlo Simulation of the 2D and 3D Ising Model,” J. Computational Physics, vol. 228, pp. 4468-4477, 2009.
[4] R.B. Buxton, E.C. Wong, and L.R. Frank, “Dynamics of Blood Flow and Oxygenation Changes during Brain Activation: The Balloon Model,” Magnetic Resonance in Medicine, vol. 39, pp. 855-864, 1998.
[5] J.B. Mandeville, J.J.A. Marot, C. Ayata, G. Zaharchuk, M.A. Moskowitz, B.R. Rosen, and R.M. Weisskoff, “Evidence of a Cerebrovascular Postarteriole Windkessel with Delayed Compliance,” J. Cerebral Blood Flow and Metabolism, vol. 19, pp. 679-689, 1999.
[6] A. Lotka, Elements of Physical Biology. Williams and Wilkins, 1925.
[7] V. Volterra, “Variations and Fluctuations of the Number of Individuals in Animal Species Living Together,” Animal Ecology, R.N. Chapman, ed., McGraw-Hill, 1931.
[8] G. Evans and J. Parslow, “A Model of Annual Plankton Cycles,” Biological Oceanography, vol. 3, pp. 327-347, 1985.
[9] S. Gill, “A Process for the Step-by-Step Integration of Differential Equations in an Automatic Digital Computing Machine,” Math. Proc. Cambridge Philosophical Soc., vol. 47, pp. 96-108, 1951.
[10] “NVIDIA's Next Generation CUDA Compute Architecture: Fermi,” NVIDIA White Paper, 2009.
[11] CUDA Programming Guide Version 3.0, NVIDIA, 2010.
[12] E. Hairer, S. Nørsett, and G. Wanner, Solving Ordinary Differential Equations I: Nonstiff Problems, second ed., Springer-Verlag, 1993.
[13] J.R. Dormand and P.J. Prince, “A Family of Embedded Runge-Kutta Formulae,” J. Computational and Applied Math., vol. 6, pp. 19-26, 1980.
[14] E. Fehlberg, “Low-Order Classical Runge-Kutta Formulas with Stepsize Control and Their Application to Some Heat Transfer Problems,” Technical Report R-315, Nat'l Aeronautics and Space Administration, 1969.
[15] J. Williamson, “Low-Storage Runge-Kutta Schemes,” J. Computational Physics, vol. 35, pp. 48-56, 1980.
[16] P.J. van der Houwen, “Explicit Runge-Kutta Formulas with Increased Stability Boundaries,” Numerische Math., vol. 20, pp. 149-164, 1972.
[17] C.A. Kennedy, M.H. Carpenter, and R.M. Lewis, “Low-Storage, Explicit Runge-Kutta Schemes for the Compressible Navier-Stokes Equations,” Applied Numerical Math., vol. 35, pp. 177-219, 2000.
[18] E.N. Lorenz, Predictability—A Problem Partly Solved, p. 118, Cambridge Univ. Press, 2006.
[19] T. Aila and S. Laine, “Understanding the Efficiency of Ray Traversal on GPUs,” Proc. High-Performance Graphics, pp. 145-149, 2009.