Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (1999)
Newport Beach, California
Oct. 12, 1999 to Oct. 16, 1999
M.F.P. O'Boyle , University of Edinburgh
P.M.W. Knijnenburg , Leiden University
This paper attempts to minimize parallelization overhead on distributed shared memory machines, such as the SGi Origin 2000, by the combination of non-singular loop and data transformations. We show that conflicting requirements on a loop transformation may be resolved by using a data transformation and vice-versa. We develop optimization criteria for locality, synchronization and communication and show that neither loop nor data transformations can be solely used for efficient parallelization. This leads to the development of a novel global optimization heuristic which is applied to 3 SPEC kernels where it is shown to outperform techniques solely based on loop or data transformations and to give significant improvement over an existing state-of-the- art commercial auto-parallelizer.
M.F.P. O'Boyle, P.M.W. Knijnenburg, "Efficient Parallelization using Combined Loop and Data Transformations", Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, vol. 00, no. , pp. 283, 1999, doi:10.1109/PACT.1999.807573