The Community for Technology Leaders
RSS Icon
Issue No.04 - July/August (2008 vol.28)
pp: 13-27
John Nickolls , NVIDIA
Joshua Anderson , Iowa State University and Ames Laboratory
Jim Hardwick , TechniScan Medical Systems
Everett Phillips , University of California, Davis
Yao Zhang , University of California, Davis
Vasily Volkov , University of California, Berkeley
The CUDA programming model provides a straightforward means of describing inherently parallel computations, and NVIDIA's Tesla GPU architecture delivers high computational throughput on massively parallel problems. This article surveys experiences gained in applying CUDA to a diverse set of problems and the parallel speedups over sequential codes running on traditional CPU architectures attained by executing key computations on the GPU.
parallel architectures, processor architectures, computer systems organization, concurrent programming structures, graphics processors, programming languages, computer graphics, computing methodologies
Michael Garland, Scott Le Grand, John Nickolls, Joshua Anderson, Jim Hardwick, Scott Morton, Everett Phillips, Yao Zhang, Vasily Volkov, "Parallel Computing Experiences with CUDA", IEEE Micro, vol.28, no. 4, pp. 13-27, July/August 2008, doi:10.1109/MM.2008.57
1. J. Nickolls et al., "Scalable Parallel Programming with CUDA," ACM Queue, vol. 6, no. 2, Mar./Apr. 2008, pp. 40-53.
2. E. Lindholm et al., "NVIDIA Tesla: A Unified Graphics and Computing Architecture," IEEE Micro, vol. 28, no. 2, Mar./Apr. 2008, pp. 39-55.
3. B. Catanzaro, N. Sundaram, and K. Keutzer, "Fast Support Vector Machine Training and Classification on Graphics Processors," Proc. 25th Ann. Int'l Conf. Machine Learning, Omnipress, 2008, pp. 104-111.
4. B. He et al., "Relational Joins on Graphics Processors," Proc. ACM SIGMOD 2008, ACM Press, 2008, .
5. M. Schatz et al., "High-Throughput Sequence Alignment Using Graphics Processing Units," BMC Bioinformatics, vol. 8, no. 1, 2007, p. 474, .
6. S. Manavski and G. Valle, "CUDA Compatible GPU Cards as Efficient Hardware Accelerators for Smith-Waterman Sequence Alignment," BMC Bioinformatics, vol. 9, suppl. 2, 2008, p. S10, .
7. S.S. Stone et al., "How GPUs Can Improve the Quality of Magnetic Resonance Imaging," Proc. 1st Workshop General Purpose Processing on Graphics Processing Units, 2007.
8. D. Frenkel and B. Smit, Understanding Molecular Simulations, Academic Press, 2002.
9. J.A. Anderson, C.D. Lorenz, and A. Travesset, "Micellar Crystals in Solution from Molecular Dynamics Simulations," J. Chemical Physics, vol. 128, 2008, pp. 184906-184916.
10. J.A. Anderson, C.D. Lorenz, and A. Travesset, "General Purpose Molecular Dynamics Simulations Fully Implemented on Graphics Processing Units," J. Computational Physics, vol. 227, no. 10, May 2008, pp. 5342-5359.
11. J.E. Stone et al., "Accelerating Molecular Modeling Applications with Graphics Processors," J. Computational Chemistry, vol. 28, no. 16, 2007, pp. 2618-2640.
12. C.I. Rodrigues et al., "GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications," Proc. 2008 Conf. Computing Frontiers (CF 08), ACM Press, 2008, pp. 273-282.
13. S. Plimpton, "Fast Parallel Algorithms for Short-Range Molecular Dynamics," J. Computational Physics, vol. 117, no. 1, 1995, pp. 1-19.
14. M. Shirts and V.S. Pande, "Screen Savers of The World Unite," Science, vol. 290, no. 5498, 2000, pp. 1903-1904.
15. V. Volkov and J.W. Demmel, "LU, QR and Cholesky Factorizations Using Vector Capabilities of GPUs," tech. report UCB/EECS-2008-49, EECS Dept., Univ. of Calif., Berkeley, 2008.
16. S. Ryoo et al., "Optimization Principles and Application Performance Evaluation of a Multithreaded GPU using CUDA," Proc. 13th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, ACM Press, 2008, pp. 73-82.
17. S.A. Johnson et al., Apparatus and Method for Imaging Objects with Wavefields, US patent 6,636,584, Patent and Trademark Office, 2003.
18. R.H. Ni, "A Multiple Grid Scheme for Solving the Euler Equations," Proc. AIAA 5th Computational Fluid Dynamics Conf., AIAA Press, 1981, pp. 257-264.
19. E.H. Phillips et al., "A Multi-Grid Solver for the 2D Compressible Euler Equations on a GPU Cluster," tech. report ECE-CE-2008-2, Computer Eng. Research Lab., Univ. of California, Davis, 2008; .
20. T. Brandvik and G. Pullan, "Acceleration of a 3D Euler Solver Using Commodity Graphics Hardware," Proc. 48th AIAA Aerospace Sciences Meeting and Exhibit, AIAA Press, 2008, p. 607.
21. H.S. Stone, "Parallel Tridiagonal Equation Solvers," ACM Trans. Mathematical Software, vol. 1, no. 4, Dec. 1975, pp. 289-307,
22. M. Kass, A. Lefohn, and J. Owens, "Interactive Depth of Field Using Simulated Diffusion on a GPU," tech. report 06-01, Pixar Animation Studios, 2006; http://graphics.pixar.comDepthOfField/.
37 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool