The Community for Technology Leaders
2014 23rd International Conference on Parallel Architecture and Compilation (PACT) (2014)
Edmonton, Canada
Aug. 23, 2014 to Aug. 27, 2014
ISBN: 978-1-5090-6607-0
pp: 225-236
Qiumin Xu , Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA
Murali Annavaram , Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA
ABSTRACT
General purpose computing using graphics processing units (GPGPUs) is an attractive option to achieve power efficient throughput computing. But the power efficiency of GPGPUs can be significantly curtailed in the presence of divergence. This paper evaluates two important facets of this problem. First, we study the branch divergence behavior of various GPGPU workloads. We show that only a few branch divergence patterns are dominant in most workloads. In fact only five branch divergence patterns account for 60% of all the divergent instructions in our workloads. In the second part of this work we exploit this branch divergence pattern bias to propose a new divergence pattern aware warp scheduler, called PATS. PATS prioritizes scheduling warps with the same divergence pattern so as to create long idleness windows for any given execution lane. The long idleness windows are then exploited for efficiently power gating the unused lanes while amortizing the gating overhead. We describe the architectural implementation details of PATS and evaluate the power and performance impact of PATS. Our proposed design significantly improves power gating efficiency of GPGPUs with minimal performance overhead.
INDEX TERMS
Instruction sets, Benchmark testing, Graphics processing units, Scheduling, Hardware, Kernel, Registers,warp scheduling, branch divergence, GPGPU, pattern, power gating
CITATION
Qiumin Xu, Murali Annavaram, "PATS: Pattern aware scheduling and power gating for GPGPUs", 2014 23rd International Conference on Parallel Architecture and Compilation (PACT), vol. 00, no. , pp. 225-236, 2014, doi:10.1145/2628071.2628105
85 ms
(Ver 3.3 (11022016))