This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Performance Implications of Nonuniform Device Topologies in Scalable Heterogeneous Architectures
September/October 2011 (vol. 31 no. 5)
pp. 66-75
Jeremy S. Meredith, Oak Ridge National Laboratory
Philip C. Roth, Oak Ridge National Laboratory
Kyle L. Spafford, Oak Ridge National Laboratory
Jeffrey S. Vetter, Oak Ridge National Laboratory

This article considers trends in heterogeneous system design, particularly for GPUs. Using the Keeneland Initial Delivery System, the authors examine the performance implications of increased parallelism and specialized hardware on parallel scientific applications. They examine how nonuniform data-transfer performance across the node-level topology can impact performance. Finally, they help users of GPU-based systems avoid performance problems related to this nonuniformity.

1. J.D. Owens et al., "A Survey of General-Purpose Computation on Graphics Hardware," Proc. Eurographics State of the Art Reports, European Assoc. Computer Graphics, 2004, pp. 21-51.
2. M. Pharr and R. Fernando, GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation, Addison-Wesley, 2005.
3. D. Grice, "The Roadrunner Project and the Importance of Energy Efficiency on the Road to Exascale Computing," Proc. 23rd Int'l Conf. Supercomputing (ICS 09), ACM Press, 2009, doi:10.1145/1542275.1542279.
4. J. Vetter et al., "Keeneland: Bringing Heterogeneous Computing Using Graphics Processors to the NSF Computational Science Community," IEEE Computing Science and Eng., vol. 13, no. 5, 2011, pp. 90-95.
5. B. Hess et al., "GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation," J. Chemical Theory and Computation, vol. 4, no. 3, 2008, pp. 435-447.
6. J.C. Phillips et al., "Scalable Molecular Dynamics with NAMD," J. Computational Chemistry, vol. 26, no. 16, 2005, pp. 1781-1802.
7. K. Spafford, J.S. Meredith, and J.S. Vetter, "Quantifying NUMA and Contention Effects in Multi-GPU Systems," Proc. 4th Workshop General-purpose Processing on Graphics Processing Units, ACM Press, 2011, doi:10.1145/1964179.1964194.
8. A. Danalis et al., "The Scalable Heterogeneous Computing (SHOC) Benchmark Suite," Proc. 3rd Workshop General-purpose Computation on Graphics Processing Units (GPGPU 10), ACM Press, 2010, pp. 63-74.
9. S. Plimpton, "Fast Parallel Algorithms for Short-Range Molecular Dynamics," J. Computational Physics, vol. 117, no. 1, 1995, pp. 1-19.
10. T.A. Maier, M.S. Jarrell, and D.J. Scalapino, "Structure of the Pairing Interaction in the Two-Dimensional Hubbard Model," Physical Review Letters, vol. 96, no. 4, 2006, pp. 047005-047008.
11. J.S. Meredith et al., "Accuracy and Performance of Graphics Processors: A Quantum Monte Carlo Application Case Study," Parallel Computing, vol. 35, no. 3, 2009, pp. 151-163.
1. N. Brookwood, "AMD Fusion Family of APUs: Enabling a Superior, Immersive PC Experience," white paper, Advanced Micro Devices, Mar. 2010.
2. K. Skaugen, "Petascale to Exascale: Extending Intel's HPC Commitment," Proc. Int'l Supercomputing Conf. (ISC 10), 2010, keynote presentation, http://download.intel.com/pressroom/archive/ referenceISC_2010_Skaugen_keynote.pdf .
3. "The Convey HC-1 Computer: Architecture Overview," white paper, Convey Computer, Nov. 2008.

Index Terms:
GPU, nonuniformity, heterogeneous GPUs, data-transfer performance
Citation:
Jeremy S. Meredith, Philip C. Roth, Kyle L. Spafford, Jeffrey S. Vetter, "Performance Implications of Nonuniform Device Topologies in Scalable Heterogeneous Architectures," IEEE Micro, vol. 31, no. 5, pp. 66-75, Sept.-Oct. 2011, doi:10.1109/MM.2011.79
Usage of this product signifies your acceptance of the Terms of Use.