The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.08 - August (2012 vol.45)
pp: 26-32
John A. Stratton , University of Illinois at Urbana-Champaign
Christopher Rodrigues , University of Illinois at Urbana-Champaign
I-Jui Sung , University of Illinois at Urbana-Champaign
Li-Wen Chang , University of Illinois at Urbana-Champaign
Nasser Anssari , University of Illinois at Urbana-Champaign
Geng Liu , University of Illinois at Urbana-Champaign
Wen-mei W. Hwu , University of Illinois at Urbana-Champaign
Nady Obeid , KLA-Tencor
ABSTRACT
A study of the implementation patterns among massively threaded applications for many-core GPUs reveals that each of the seven most commonly used algorithm and data optimization techniques can enhance the performance of applicable kernels by 2 to 10× in current processors while also improving future scalability. The featured Web extra is a video interview with author John Stratton, who describes how implementation patterns can improve future scalability. YouTube URL: http://youtu.be/fgn9LJbInMw
INDEX TERMS
Instruction sets, System-on-a-chip, Bandwidth, Histograms, Optimization, Graphics processing unit, Multithreading, Parboil benchmarks, massively threaded systems, optimization patterns, accelerators, scalability
CITATION
John A. Stratton, Christopher Rodrigues, I-Jui Sung, Li-Wen Chang, Nasser Anssari, Geng Liu, Wen-mei W. Hwu, Nady Obeid, "Algorithm and Data Optimization Techniques for Scaling to Massively Threaded Systems", Computer, vol.45, no. 8, pp. 26-32, August 2012, doi:10.1109/MC.2012.194
REFERENCES
1. P. Kogge et al., ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems, IPTO tech. report TR-2008-13, DARPA, 2008; www.cse.nd.edu/Reports/2008TR-2008-13.pdf .
2. W. Hwu ed., GPU Computing Gems Emerald Edition, Morgan Kaufmann, 2011.
3. W. Hwu ed., GPU Computing Gems Jade Edition, Morgan Kaufmann, 2011.
4. J. Stratton et al., The Parboil Benchmarks, tech. report IMPACT-12-01, Univ. of Illinois at Urbana-Champaign, 2012.
5. I. Sung, G. Liu, and W. Hwu, “DL: A Data Layout Transformation System for Heterogeneous Computing,” Proc. IEEE Conf. Innovative Parallel Computing (InPar 12), IEEE, 2012.
6. M. Billeter, O. Olsson, and U. Assarsson, “Efficient Stream Compaction on Wide SIMD Many-Core Architectures,” Proc. Conf. High-Performance Graphics (HPG 09), IEEE, 2009, pp. 159-166.
7. D. Hardy et al., “Fast Molecular Electrostatics Algorithms on GPUs,” GPU Computing Gems Emerald Edition, W. Hwu ed., Morgan Kaufmann, 2011, pp. 43-58.
8. J. Stratton et al., “, Optimization and Architecture Effects on GPU Computing Workload Performance,” Proc. IEEE Conf. Innovative Parallel Computing (InPar 12), IEEE, 2012.
15 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool