The Community for Technology Leaders
2014 23rd International Conference on Parallel Architecture and Compilation (PACT) (2014)
Edmonton, Canada
Aug. 23, 2014 to Aug. 27, 2014
ISBN: 978-1-5090-6607-0
pp: 163-174
James A. Jablin , Brown University, Dept. of Computer Science
Thomas B. Jablin , University of Illinois at Urbana-Champaign, Dept. of Electrical and Computer Engineering
Onur Mutlu , Carnegie Mellon University, Dept. of Electrical and Computer Engineering
Maurice Herlihy , Brown University, Dept. of Computer Science
ABSTRACT
GPU performance depends not only on thread/warp level parallelism (TLP) but also on instruction-level parallelism (ILP). It is not enough to schedule instructions within basic blocks, it is also necessary to exploit opportunities for ILP optimization beyond branch boundaries. Unfortunately, modern GPUs cannot dynamically carry out such optimizations because they lack hardware branch prediction and cannot speculatively execute instructions beyond a branch. We propose to circumvent these limitations by adapting Trace Scheduling, a technique originally developed for microcode optimization. Trace Scheduling divides code into traces (or paths), and optimizes each trace in a context-independent way. Adapting Trace Scheduling to GPU code requires revisiting and revising each step of microcode Trace Scheduling to attend to branch and warp behavior, identifying instructions on the critical path, avoiding warp divergence, and reducing divergence time. Here, we propose "Warp-Aware Trace Scheduling" for GPUs. As evaluated on the Rodinia Benchmark Suite using dynamic profiling, our fully-automatic optimization achieves a geometric mean speedup of 1.10× on a real system by increasing instructions executed per cycle (IPC) by a harmonic mean of 1.12× and reducing instruction serialization and total instructions executed.
INDEX TERMS
Graphics processing units, Scheduling, Instruction sets, Optimization, Processor scheduling, Schedules,trace scheduling, GPU, compiler optimization, instruction-level parallelism, global instruction scheduling
CITATION
James A. Jablin, Thomas B. Jablin, Onur Mutlu, Maurice Herlihy, "Warp-aware trace scheduling for GPUs", 2014 23rd International Conference on Parallel Architecture and Compilation (PACT), vol. 00, no. , pp. 163-174, 2014, doi:10.1145/2628071.2628101
88 ms
(Ver 3.3 (11022016))