This Article 
 Bibliographic References 
 Add to: 
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
October 1991 (vol. 2 no. 4)
pp. 452-471

An approach to transformations for general loops in which dependence vectors represent precedence constraints on the iterations of a loop is presented. Therefore, dependences extracted from a loop nest must be lexicographically positive. This leads to a simple test for legality of compound transformations: any code transformation that leaves the dependences lexicographically positive is legal. The loop transformation theory is applied to the problem of maximizing the degree of coarse- or fine-grain parallelism in a loop nest. It is shown that the maximum degree of parallelism can be achieved by transforming the loops into a nest of coarsest fully permutable loop nests and wavefronting the fully permutable nests. The canonical form of coarsest fully permutable nests can be transformed mechanically to yield maximum degrees of coarse- and/or fine-grain parallelism. The efficient heuristics can find the maximum degrees of parallelism for loops whose nesting level is less than five.

[1] R. Allen and K. Kennedy, "Automatic translation of FORTRAN to vector form,"ACM Trans. Programming Languages Syst., vol. 9, no. 4, pp. 491-524, 1987.
[2] U. Banerjee, "Data dependence in ordinary programs," Tech. Rep. 76-837, Univ. of Illinois Urbana-Champaign, Nov. 1976.
[3] U. Banerjee,Dependence Analysis for Supercomputing, Kluwer Academic Publishers, Norwell, Mass., 1988.
[4] U. Banerjee, "A theory of loop permutations," inProc. 2nd Workshop Languages Compilers Parallel Computing, Aug. 1989.
[5] U. Banerjee, "Unimodular transformations of double loops," inProc. 3rd Workshop Languages Compilers Parallel Computing, Aug. 1989.
[6] R. G. Cytron, "Compile-time scheduling and optimization for multiprocessors," Ph.D. dissertation, Univ. of Illinois at Urbana-Champaign, DCS Rep. UIUCDCS-R-84-1177, 1984.
[7] J.-M. Delosme and I. C. F. Ipsen, "Efficient systolic arrays for the solution of Toeplitz systems: An illustration of a methodology for the construction of systolic architectures in VLSI," Tech. Rep. 370, Yale Univ. 1985.
[8] J. A. B. Fortes and D. I. Moldovan, "Parallelism detections and transformation techniques useful for VLSI algorithms,"J. Parallel Distributed Comput., vol. 2, pp. 277-301, 1985.
[9] K. Gallivan, W. Jalby, U. Meier, and A. Sameh, "The impact of hierarchical memory systems on linear algebra algorithm design," Tech. Rep., Univ. of Illinois, 1987.
[10] D. Gannon, W. Jalby, and K. Gallivan, "Strategies for Cache and Local Memory Management by Global Program Transformation,"J. Parallel and Distributed Computing, Vol. 5, No. 5, Oct. 1988, pp. 587-616.
[11] F. Irigoin, "Partitionnement des boucles imbeiquees: Une technique d'optimisation pour les programmes scientifiques," Ph.D. dissertation, UniversitéParis-VI, June 1987.
[12] F. Irigoin and R. Triolet, "Computing dependence direction vectors and dependence cones," Tech. Rep. E94, Centre D'Automatique et Informatique, 1988.
[13] F. Irigoin and R. Triolet, "Supernode partitioning," inProc. Fifteenth Annu. ACM. SIGACT-SIGPLAN Symp. Principles Programming Languages, Jan. 1988, pp. 319-329.
[14] F. Irigoin and R. Triolet, "Dependence approximation and global parallel code generation for nested loops," inParallel Distributed Algorithms, 1989.
[15] M. Lam, "Software Pipelining: An Effective Scheduling Technique for VLIW Machines,"Proc. Sigplan 88 Conf. Programming Language Design and Implementation, ACM, New York, 1988, pp. 318-328.
[16] D. E. Maydan, J. L. Hennessy, and M. S. Lam, "Efficient and exact data dependence analysis," inProc. ACM SIGPLAN '91 Conf. Programming Language Design Implementation, June 1991, pp. 1-14.
[17] A. Porterfield, "Software Methods for Improvement of Cache Performance on Supercomputer Applications," PhD thesis, Dept. of Computer Sci., Rice Univ., 1989.
[18] P. Quinton, "The systematic design of systolic arrays," Tech. Rep. 193, Centre National de la Recherche Scientifique, 1983.
[19] P. Quinton, "Automatic synthesis of systolic arrays from uniform recurrent equations," inProc. 11th Annu. Symp. Comput. Architecture, 1984, pp. 208-214.
[20] H.B. Ribas, "Automatic Generation of Systolic Programs from Nested Loops," doctoral dissertation, Carnegie Mellon Univ., Pittsburgh, June 1990.
[21] R. Schreiber and J. Dongarra, "Automatic blocking of nested loops," 1990.
[22] C.-W. Tseng and M. J. Wolfe, "The power test for data dependence," Tech. Rep., Rice COMP TR90-145, Rice Univ., Dec. 1990.
[23] M. E. Wolf, "Improving parallelism and data locality in nested loops," Ph.D. dissertation, Stanford Univ., 1991, in preparation.
[24] M.E. Wolf, "A Data Locality Optimizing Algorithm,"Proc. ACM Sigplan Conf. Programming Language Design and Implementation, ACM, New York, 1991, pp. 30-44.
[25] M. Wolfe, "More iteration space tiling," inProc. Supercomputing '89, 1989, pp. 655-664.
[26] M. Wolfe,Optimizing Supercompilers for Supercomputers. Cambridge MA: MIT Press, 1989.

Index Terms:
Index Termsparallel algorithm; loop iterations; coarse grain parallelism; wavefront; loop transformation theory; general loops; dependence vectors; precedence constraints; lexicographically positive; legality; compound transformations; code transformation; fine-grain parallelism; maximum degree; coarsest fully permutable loop nests; fully permutable nests; canonical form; heuristics; parallel algorithms; parallel programming; program compilers
M.E. Wolf, M.S. Lam, "A Loop Transformation Theory and an Algorithm to Maximize Parallelism," IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 4, pp. 452-471, Oct. 1991, doi:10.1109/71.97902
Usage of this product signifies your acceptance of the Terms of Use.