The Community for Technology Leaders
2014 23rd International Conference on Parallel Architecture and Compilation (PACT) (2014)
Edmonton, Canada
Aug. 23, 2014 to Aug. 27, 2014
ISBN: 978-1-5090-6607-0
pp: 151-162
Rashid Kaleem , Dept. of Computer Science, University of Texas at Austin
Rajkishore Barik , Intel Labs, Santa Clara, CA
Tatiana Shpeisman , Intel Labs, Santa Clara, CA
Chunling Hu , Intel Labs, Santa Clara, CA
Brian T. Lewis , Intel Labs, Santa Clara, CA
Keshav Pingali , Dept. of Computer Science, University of Texas at Austin
Many processors today integrate a CPU and GPU on the same die, which allows them to share resources like physical memory and lowers the cost of CPU-GPU communication. As a consequence, programmers can effectively utilize both the CPU and GPU to execute a single application. This paper presents novel adaptive scheduling techniques for integrated CPU-GPU processors. We present two online profiling-based scheduling algorithms: naïve and asymmetric. Our asymmetric scheduling algorithm uses low-overhead online profiling to automatically partition the work of dataparallel kernels between the CPU and GPU without input from application developers. It does profiling on the CPU and GPU in a way that doesn't penalize GPU-centric workloads that run significantly faster on the GPU. It adapts to application characteristics by addressing: 1) load imbalance via irregularity caused by, e.g., data-dependent control flow, 2) different amounts of work on each kernel call, and 3) multiple kernels with different characteristics. Unlike many existing approaches primarily targeting NVIDIA discrete GPUs, our scheduling algorithm does not require offline processing. We evaluate our asymmetric scheduling algorithm on a desktop system with an Intel 4th Generation Core Processor using a set of sixteen regular and irregular workloads from diverse application areas. On average, our asymmetric scheduling algorithm performs within 3.2% of the maximum throughput with a CPU-and-GPU oracle that always chooses the best work partitioning between the CPU and GPU. These results underscore the feasibility of online profile-based heterogeneous scheduling on integrated CPU-GPU processors.
Graphics processing units, Kernel, Scheduling algorithms, Programming, C++ languages,Irregular applications, Heterogeneous computing, integrated GPUs, scheduling, load balancing
Rashid Kaleem, Rajkishore Barik, Tatiana Shpeisman, Chunling Hu, Brian T. Lewis, Keshav Pingali, "Adaptive heterogeneous scheduling for integrated GPUs", 2014 23rd International Conference on Parallel Architecture and Compilation (PACT), vol. 00, no. , pp. 151-162, 2014, doi:10.1145/2628071.2628088
89 ms
(Ver 3.3 (11022016))