The Community for Technology Leaders
2006 International Conference on Parallel Architectures and Compilation Techniques (PACT) (2006)
Seattle, WA, USA
Sept. 16, 2006 to Sept. 20, 2006
ISBN: 978-1-5090-3022-4
pp: 33-42
Abhishek Das , Stanford University
William J. Dally , Stanford University
Peter Mattson , Stream Processors, Inc.
This paper describes a compiler for stream programs that efficiently schedules computational kernels and stream memory operations, and allocates on-chip storage. Our compiler uses information about the program structure and estimates of kernel and memory operation execution times to overlap kernel execution with memory transfers, maximizing performance, and to optimize use of scarce on-chip memory, significantly reducing external memory bandwidth. Our compiler applies optimizations such as strip-mining, loop unrolling, and software pipelining, at the level of kernels and stream memory operations. We evaluate the performance of our compiler on a suite of media and scientific benchmarks. Our results show that compiler management of on-chip storage reduces external memory bandwidth by 35% to 93% and reduces execution time by 23% to 72% compared to cache like LRU management of the same storage. We show that strip-mining stream applications enables producer-consumer locality to be captured in on-chip storage reducing external bandwidth by 50% to 80%. We also evaluate the sensitivity of performance to the scheduling methods used and to critical resources. Overall, our compiler is able to overlap memory operations and manage local storage so that 78% to 96% of program execution time is spent in running computational kernels.
Scoreboard slot assignment, Stream Programming model, StreamC, Task level parallelism, Producer-consumer locality, Stream scheduling, coarsegrained operations, Stream Operation Precedence (SOP) graph, SRF allocation, Strip-mining, Software-pipelining

A. Das, W. J. Dally and P. Mattson, "Compiling for stream processing," 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT), Seattle, WA, USA, 2006, pp. 33-42.
174 ms
(Ver 3.3 (11022016))