The Community for Technology Leaders
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2013)
Edinburgh, United Kingdom United Kingdom
Sept. 7, 2013 to Sept. 11, 2013
ISSN: 1089-795X
ISBN: 978-1-4799-1018-2
pp: 257-267
Wenhao Jia , Princeton Univ., Princeton, NJ, USA
Kelly A. Shaw , Univ. of Richmond, Richmond, CA, USA
Margaret Martonosi , Princeton Univ., Princeton, NJ, USA
Graphics processing units (GPUs) are in increasingly wide use, but significant hurdles lie in selecting the appropriate algorithms, runtime parameter settings, and hardware configurations to achieve power and performance goals with them. Exploring hardware and software choices requires time-consuming simulations or extensive real-system measurements. While some auto-tuning support has been proposed, it is often narrow in scope and heuristic in operation. This paper proposes and evaluates a statistical analysis technique, Starchart, that partitions the GPU hardware/software tuning space by automatically discerning important inflection points in design parameter values. Unlike prior methods, Starchart can identify the best parameter choices within different regions of the space. Our tool is efficient - evaluating at most 0.3% of the tuning space, and often much less - and is robust enough to analyze highly variable real-system measurements, not just simulation. In one case study, we use it to automatically find platform-specific parameter settings that are 6.3× faster (for AMD) and 1.3× faster (for NVIDIA) than a single general setting. We also show how power-optimized parameter settings can save 47W (26% of total GPU power) with little performance loss. Overall, Starchart can serve as a foundation for a range of GPU compiler optimizations, auto-tuners, and programmer tools. Furthermore, because Starchart does not rely on specific GPU features, we expect it to be useful for broader CPU/GPU studies as well.
Graphics processing units, Kernel, Tuning, Hardware, Optimization, Power measurement

Wenhao Jia, K. A. Shaw and M. Martonosi, "Concurrent predicates: a debugging technique for every parallel programmer," Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques(PACT), Edinburgh, United Kingdom United Kingdom, 2013, pp. 257-267.
761 ms
(Ver 3.3 (11022016))