The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - May/June (2009 vol.11)
pp: 16-26
Wen-Mei Hwu , University of Illinois
Christopher Rodrigues , University of Illinois
Shane Ryoo , ZeroSoft
John Stratton , University of Illinois
ABSTRACT
Graphics processing units (GPUs) can provide excellent speedups on some, but not all, general-purpose workloads. Using a set of computational GPU kernels as examples, the authors show how to adapt kernels to utilize the architectural features of a GeForce 8800 GPU and what finally limits the achievable performance.
INDEX TERMS
CUDA, GPGPU, computer architecture, software optimization, benchmarks, compute unified device architecture, general-purpose computing on GPU
CITATION
Wen-Mei Hwu, Christopher Rodrigues, Shane Ryoo, John Stratton, "Compute Unified Device Architecture Application Suitability", Computing in Science & Engineering, vol.11, no. 3, pp. 16-26, May/June 2009, doi:10.1109/MCSE.2009.48
REFERENCES
1. J. Owens, GPU Gems 2, Addison-Wesley, 2005, pp. 457–470.
2. M.J. Atallah ed., Algorithms and Theory of Computation Handbook, CRC Press, 1998.
3. S. Ryoo et al., "Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA," Proc. 13th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, ACM Press, 2008, pp. 73–82.
4. OpenMP Architecture Rev. Board, "OpenMP Application Program Interface," May 2005; www.openmp.org/mp-documentsspec25.pdf.
5. I. Buck et al., "Brook for GPUs: Stream Computing on Graphics Hardware," ACM SIGGRAPH 2004 Papers, ACM Press, 2004, pp. 777–786.
6. D. Tarditi, S. Puri, and J. Oglesby, "Accelerator: Using Data Parallelism to Program GPUs for General Purpose Uses," Proc. 12th Int'l Conf. Architectural Support for Programming Languages and Operating Systems, ACM Press, 2006, pp. 325–335.
7. L. Seiler et al., "Larrabee: A Many-Core x86 Architecture for Vi-sual Computing," ACM Trans. Graphics, vol. 27, Aug. 2008, pp. 1–1-5.
8. AMD, R600-Family Instruction Set Architecture, tech. rep., Advanced Micro Devices, May 2007.
9. S.S. Stone et al. "Accelerating Advanced MRI Reconstruction using GPUs," ACM Computing Frontiers Conf. 2008, ACM Press, 2008, pp. 251–260.
10. J.E. Stone et al., "Accelerating Molecular Modeling Applications with Graphics Processors," J. Computational Chemistry, vol. 28, Dec. 2007, pp. 2618–2640.
11. D. Callahan, S. Carr, and K. Kennedy, "Improving Register Allocation for Subscripted Variables," ACM SIGPLAN Notices, vol. 9, no. 4, 2004, pp. 328–342.
12. K. Kennedy and J.R. Allen, Optimizing Compilers for Modern Architectures: A Dependence-Based Approach, Morgan Kaufmann, 2002.
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool