The Community for Technology Leaders
2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) (2016)
Haifa, Israel
Sept. 11, 2016 to Sept. 15, 2016
ISBN: 978-1-5090-5308-7
pp: 423-424
Guray Ozen , Barcelona Supercomputing Center, Universitat Politecnica de Catalunya, Spain
Eduard Ayguade , Barcelona Supercomputing Center, Universitat Politecnica de Catalunya, Spain
Jesus Labarta , Barcelona Supercomputing Center, Universitat Politecnica de Catalunya, Spain
ABSTRACT
Early programs for GPU (Graphics Processing Units) acceleration were based on a flat, bulk parallel programming model, in which programs had to perform a sequence of kernel launches from the host CPU. In the latest releases of these devices, dynamic (or nested) parallelism is supported, making possible to launch kernels from threads running on the device, without host intervention. Unfortunately, the overhead of launching kernels from the device is higher compared to launching from the host CPU, making the exploitation of dynamic parallelism unprofitable. This paper proposes and evaluates the basic idea behind a user-directed code transformation technique, named collective dynamic parallelism, that targets the effective exploitation of nested parallelism in modern GPUs. The technique dynamically packs dynamic parallelism kernel invocations and postpones their execution until a bunch of them are available. We show that for sparse matrix vector multiplication, CollectiveDP outperforms well optimized libraries, making GPU useful when matrices are highly irregular.
INDEX TERMS
Kernel, Parallel processing, Graphics processing units, Context, Sparse matrices, Parallel programming, Libraries
CITATION
Guray Ozen, Eduard Ayguade, Jesus Labarta, "POSTER - collective dynamic parallelism for directive based GPU programming languages and compilers", 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT), vol. 00, no. , pp. 423-424, 2016, doi:10.1145/2967938.2974056
80 ms
(Ver 3.3 (11022016))