The Community for Technology Leaders
2015 International Conference on Parallel Architecture and Compilation (PACT) (2015)
San Francisco, CA, USA
Oct. 18, 2015 to Oct. 21, 2015
ISSN: 1089-795X
ISBN: 978-1-4673-9524-3
pp: 355-366
Traditionally, programmers and software tools have focused on mapping a single data-parallel kernel onto a heterogeneous computing system consisting of multiple general-purpose processors (CPUS) and graphics processing units (GPUs). These methodologies break down as application complexity grows to contain multiple communicating data-parallel kernels. This paper introduces MKMD, an automatic system for mapping multiple kernels across multiple computing devices in a seamless manner. MKMD is a two phased approach that combines coarse grain scheduling of indivisible kernels followed by opportunistic fine-grained workgroup-level partitioning to exploit idle resources. During this process, MKMD considers kernel dependencies and the underlying systems along with the execution time model built with a few sets of profile data. With the scheduling decision, MKMD transparently manages the order of executions and data transfers for each device. On a real machine with one CPU and two different GPUs, MKMD achieves a mean speedup of 1.89x compared to the in-order execution on the fastest device for a set of applications with multiple kernels. 53% of this speedup comes from the coarse-grained scheduling and the other 47% is the result of the fine-grained partitioning.
Kernel, Performance evaluation, Processor scheduling, Computational modeling, Hardware, Mathematical model, Schedules

J. Lee, M. Samadi and S. Mahlke, "Orchestrating Multiple Data-Parallel Kernels on Multiple Devices," 2015 International Conference on Parallel Architecture and Compilation (PACT), San Francisco, CA, USA, 2015, pp. 355-366.
93 ms
(Ver 3.3 (11022016))