Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2013)
Edinburgh, United Kingdom United Kingdom
Sept. 7, 2013 to Sept. 11, 2013
Sangmin Seo , ManyCoreSoft, Seoul, South Korea
Jun Lee , Dept. of Comput. Sci. & Eng., Seoul Nat. Univ., Seoul, South Korea
Gangwon Jo , Dept. of Comput. Sci. & Eng., Seoul Nat. Univ., Seoul, South Korea
Jaejin Lee , Dept. of Comput. Sci. & Eng., Seoul Nat. Univ., Seoul, South Korea
In this paper, we address the effect of the work-group size on the performance of OpenCL kernels. We propose a profiling-based algorithm that finds a good work-group size, in terms of performance, for the target multicore CPU architecture. Our algorithm reduces misses in the private L1 data cache and achieves load balancing between cores. It exploits the polyhedral model to estimate the working-set size and the number of cache misses for a parameterized work-group size of the OpenCL kernel. Based on the profiling information, it heuristically searches the space of parameterized work-group sizes. Our virtually-extended index space helps to increase the probability to find a better work-group size. We implement our work-group size selection algorithm as a development tool that consists of a code generator and a search library. The code generator extracts the polytope of each memory reference from the kernel code and generates a function that simplifies polytopes using the run-time information and invokes search library routines. The search library calculates the working-set size using the polytopes and finds a proper work-group size. We evaluate our approach using 31 OpenCL kernels on four different multicore CPUs. We compare its accuracy and search time to those of an exhaustive search method. Experimental results show that our tool is, on average, 1566 times faster than the exhaustive search and selects a work-group size whose performance is the same as or comparable to that of the exhaustive search.
Search problems, Kernel, Tin
Sangmin Seo, Jun Lee, Gangwon Jo and Jaejin Lee, "Exposing ILP in custom hardware with a dataflow compiler IR," Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques(PACT), Edinburgh, United Kingdom United Kingdom, 2013, pp. 387-397.