The Community for Technology Leaders
2014 23rd International Conference on Parallel Architecture and Compilation (PACT) (2014)
Edmonton, Canada
Aug. 23, 2014 to Aug. 27, 2014
ISBN: 978-1-5090-6607-0
pp: 331-341
Hao Wang , The University of Wisconsin-Madison, WI, U.S.A.
Ripudaman Singh , The University of Wisconsin-Madison, WI, U.S.A.
Michael J. Schulte , Advanced Micro Devices, TX, U.S.A.
Nam Sung Kim , The University of Wisconsin-Madison, WI, U.S.A.
ABSTRACT
Technology scaling enables the integration of both the CPU and the GPU into a single chip for higher throughput and energy efficiency. In such a single-chip heterogeneous processor (SCHP), its memory bandwidth is the most critically shared resource, requiring judicious management to maximize the throughput. Previous studies on memory scheduling for SCHPs have focused on the scenario where multiple applications are running on the CPU and the GPU respectively, which we denote as a multitasking scenario. However, another increasingly important usage scenario for SCHPs is cooperative heterogeneous computing, where a single parallel application is partitioned between the CPU and the GPU such that the overall throughput is maximized. In previous studies on memory scheduling techniques for chip multi-processors (CMPs) and SCHPs, the first-ready first-come-first-service (FR-FCFS) scheduling policy was used as an inept baseline due to its fairness issue. However, in a cooperative heterogeneous computing scenario, we first demonstrate that FR-FCFS actually offers nearly 10% higher throughput than two recently proposed memory scheduling techniques designed for a multi-tasking scenario. Second, based on our analysis on memory access characteristics in a cooperative heterogeneous computing scenario, we propose various optimization techniques that enhance the row-buffer locality by 10%, reduce the service latency of CPU memory requests by 26%, and improve the overall throughput by up to 8% compared to FR-FCFS.
INDEX TERMS
Graphics processing units, Throughput, Processor scheduling, Computational modeling, Central Processing Unit, Job shop scheduling, Benchmark testing,Heterogeneous processor, Memory scheduling
CITATION
Hao Wang, Ripudaman Singh, Michael J. Schulte, Nam Sung Kim, "Memory scheduling towards high-throughput cooperative heterogeneous computing", 2014 23rd International Conference on Parallel Architecture and Compilation (PACT), vol. 00, no. , pp. 331-341, 2014, doi:10.1145/2628071.2628096
92 ms
(Ver 3.3 (11022016))