2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT) (2012)
Minneapolis, MN, USA
Sept. 19, 2012 to Sept. 23, 2012
DOI Bookmark: http://doi.ieeecomputersociety.org/
Hao Wang , The University of Wisconsin-Madison, U.S.A.
Vijay Sathish , The University of Wisconsin-Madison, U.S.A.
Ripudaman Singh , The University of Wisconsin-Madison, U.S.A.
Michael J. Schulte , Advanced Micro Devices, TX, U.S.A.
Nam Sung Kim , The University of Wisconsin-Madison, U.S.A.
With technology scaling, manufacturers are integrating both CPU and GPU cores in a single chip to improve the throughput of emerging applications. To maximize the throughput of a single-chip heterogeneous processor (SCHP), the chip power budget shared between the CPU and GPU must be effectively utilized. At the same time, the CPU and GPU in an SCHP must each satisfy its own power constraint. Furthermore, the power budget allocated to the CPU and GPU impacts performance. In this paper, using a detailed cycle-level SCHP simulator, we first demonstrate that the joint optimization of workload and power budget partitioning between the CPU and GPU can provide 13% higher throughput than the optimization of workload partitioning alone under a fixed power budget allocation to the CPU and GPU. Second, we propose an effective runtime algorithm that can determine near-optimal or optimal combinations of workload and power budget partitioning. The algorithm exploits the runtime power efficiencies of the workload executed on the CPU and the GPU. Using the detailed cycle-level SCHP simulator, we show that within five to eight kernel invocations the algorithm can achieve 96% of the maximum throughput obtained by an exhaustive search algorithm. Finally, we demonstrate comparable throughput improvements when we apply the algorithm to a commercial computing system with an SCHP.
Graphics processing units, Throughput, Central Processing Unit, Runtime, Algorithm design and analysis, Partitioning algorithms
H. Wang, V. Sathish, R. Singh, M. J. Schulte and N. S. Kim, "Workload and power budget partitioning for single-chip heterogeneous processors," 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT), Minneapolis, MN, USA, 2012, pp. 401-410.