2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT) (2010)
Sept. 11, 2010 to Sept. 15, 2010
DOI Bookmark: http://doi.ieeecomputersociety.org/
Uday Bondhugula , Advanced Compiler Technologies, IBM T.J. Watson Research Center, Yorktown Heights, New York, USA
Sanjeeb Dash , Business Analytics and Mathematical Sciences, IBM T.J. Watson Research Center, Yorktown Heights, New York, USA
Oktay Gunluk , Business Analytics and Mathematical Sciences, IBM T.J. Watson Research Center, Yorktown Heights, New York, USA
Lakshminarayanan Renganarayanan , Advanced Compiler Technologies, IBM T.J. Watson Research Center, Yorktown Heights, New York, USA
Loop fusion has been studied extensively, but in a manner isolated from other transformations. This was mainly due to the lack of a powerful intermediate representation for application of compositions of high-level transformations. Fusion presents strong interactions with parallelism and locality. Currently, there exist no models to determine good fusion structures integrated with all components of an auto-parallelizing compiler. This is also one of the reasons why all the benefits of optimization and automatic parallelization of long sequences of loop nests spanning hundreds of lines of code have never been explored. We present a fusion model in an integrated automatic parallelization framework that simultaneously optimizes for hardware prefetch stream buffer utilization, locality, and parallelism. Characterizing the legal space of fusion structures in the polyhedral compiler framework is not difficult. However, incorporating useful optimization criteria into such a legal space to pick good fusion structures is very hard. The model we propose captures utilization of hardware prefetch streams, loss of parallelism, as well as constraints imposed by privatization and code expansion into a single convex optimization space. The model scales very well to program sections spanning hundreds of lines of code. It has been implemented into the polyhedral pass of the IBM XL optimizing compiler. Experimental results demonstrate its effectiveness in finding good fusion structures for codes including SPEC benchmarks and large applications. An improvement ranging from 5% to nearly a factor of 2.75× is obtained over the current production compiler optimizer on these benchmarks.
Locality optimization, Automatic parallelization, Loop fusion, Polyhedral model, Prefetching
U. Bondhugula, S. Dash, O. Gunluk and L. Renganarayanan, "A model for fusion and code motion in an automatic parallelizing compiler," 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), Vienna, Austria, 2010, pp. 343-352.