The Community for Technology Leaders
2014 23rd International Conference on Parallel Architecture and Compilation (PACT) (2014)
Edmonton, Canada
Aug. 23, 2014 to Aug. 27, 2014
ISBN: 978-1-5090-6607-0
pp: 509-510
Alexandre Isoard , ENS de Lyon, France
Todays' hardware diversity exacerbates the need for optimizing compilers. A problem that arises when exploiting hardware accelerators (FPGA, GPU, dedicated boards) is how to automatically perform kernel/function offloading or outlining (as opposed to function inlining). The principle is to outsource part of the computation (the kernel to be performed on the accelerator) to a more efficient but more specialized hardware. This requires static analysis to identify the kernel input (data read) and output (data produced) and code generation for the kernel itself, the associated transfers, and the synchronization with the rest of the code (on the host CPU). In general, such tasks are done by the developer who is required to explicit the communications, allocate and size the intermediate buffers, and segment the kernel into fitting chunks of computation. When a single kernel is offloaded in a three-phases process (i.e., upload, compute, store back), such programming remains feasible: for GPUs, the developers can use OpenCL or CUDA, or rely on higherlevel abstractions, such as the directives of OpenACC1 or the garbage collector mechanisms of SPOC2. However, in some cases, it is necessary to decompose a kernel into a sequence of smaller kernels (to get blocking algorithms, thanks to loop tiling) that are optimized with pipelined communications and data reuse among blocks (tiles). The choice of tile sizes is driven by hardware capabilities such as memory bandwidth, memory size and organization, computational power, and such codes are extremely hard to obtain without automation and some cost model. The contribution supported by this abstract and the associated poster is a parametric (w.r.t. tile size) analysis technique to perform these steps, including inter-tile data reuse and pipelining, using polyhedral optimizations3. It has been presented at the IMPACT'14 workshop [2].
Kernel, Field programmable gate arrays, Analytical models, Graphics processing units, Hardware, Arrays, Programming

A. Isoard, "Data-reuse optimizations for pipelined tiling with parametric tile sizes," 2014 23rd International Conference on Parallel Architecture and Compilation (PACT), Edmonton, Canada, 2014, pp. 509-510.
85 ms
(Ver 3.3 (11022016))