Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques (1997)
San Francisco, CA
Nov. 11, 1997 to Nov. 15, 1997
Ramakrishnan Rajamony , Rice University
Alan L. Cox , Rice University
We present two algorithms to minimize the amount of synchronization added when parallelizing a loop with loop--carried dependences. In contrast to existing schemes, our algorithms add lesser synchronization, while preserving the parallelism that can be extracted from the loop. Our first algorithm uses an interval graph representation of the dependence "overlap'' to find a synchronization placement in time almost linear in the number of dependences. Although this solution may be suboptimal, it is still better than that obtained using existing methods, which first eliminate redundant dependences and then synchronize the remaining ones. Determining the optimal synchronization is an NP--complete problem. Our second algorithm therefore uses integer programming to determine the optimal solution. We first use a polynomial--time algorithm to find a minimal search space that must contain the optimal solution. Then, we formulate the problem of choosing the minimal synchronization from the search space as a set--cover problem, and solve it exactly using 0--1 integer programming. We show the performance impact of our algorithms by synchronizing a set of synthetic loops on an 8--processor Convex Exemplar. The greedily synchronized loops ran between 7% and 22% faster than those synchronized by the best existing algorithm. Relative to the same base, the optimally synchronized loops ran between 10% and 22% faster.
Doacross loops, synchronization, shared memory multiprocessors
A. L. Cox and R. Rajamony, "Optimally Synchronizing DOACROSS Loops on Shared Memory Multiprocessors," Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques(PACT), San Francisco, CA, 1997, pp. 214.