
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
M.E. Wolf, M.S. Lam, "A Loop Transformation Theory and an Algorithm to Maximize Parallelism," IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 4, pp. 452471, October, 1991.  
BibTex  x  
@article{ 10.1109/71.97902, author = {M.E. Wolf and M.S. Lam}, title = {A Loop Transformation Theory and an Algorithm to Maximize Parallelism}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {2}, number = {4}, issn = {10459219}, year = {1991}, pages = {452471}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.97902}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  A Loop Transformation Theory and an Algorithm to Maximize Parallelism IS  4 SN  10459219 SP452 EP471 EPD  452471 A1  M.E. Wolf, A1  M.S. Lam, PY  1991 KW  Index Termsparallel algorithm; loop iterations; coarse grain parallelism; wavefront; loop transformation theory; general loops; dependence vectors; precedence constraints; lexicographically positive; legality; compound transformations; code transformation; finegrain parallelism; maximum degree; coarsest fully permutable loop nests; fully permutable nests; canonical form; heuristics; parallel algorithms; parallel programming; program compilers VL  2 JA  IEEE Transactions on Parallel and Distributed Systems ER   
An approach to transformations for general loops in which dependence vectors represent precedence constraints on the iterations of a loop is presented. Therefore, dependences extracted from a loop nest must be lexicographically positive. This leads to a simple test for legality of compound transformations: any code transformation that leaves the dependences lexicographically positive is legal. The loop transformation theory is applied to the problem of maximizing the degree of coarse or finegrain parallelism in a loop nest. It is shown that the maximum degree of parallelism can be achieved by transforming the loops into a nest of coarsest fully permutable loop nests and wavefronting the fully permutable nests. The canonical form of coarsest fully permutable nests can be transformed mechanically to yield maximum degrees of coarse and/or finegrain parallelism. The efficient heuristics can find the maximum degrees of parallelism for loops whose nesting level is less than five.
[1] R. Allen and K. Kennedy, "Automatic translation of FORTRAN to vector form,"ACM Trans. Programming Languages Syst., vol. 9, no. 4, pp. 491524, 1987.
[2] U. Banerjee, "Data dependence in ordinary programs," Tech. Rep. 76837, Univ. of Illinois UrbanaChampaign, Nov. 1976.
[3] U. Banerjee,Dependence Analysis for Supercomputing, Kluwer Academic Publishers, Norwell, Mass., 1988.
[4] U. Banerjee, "A theory of loop permutations," inProc. 2nd Workshop Languages Compilers Parallel Computing, Aug. 1989.
[5] U. Banerjee, "Unimodular transformations of double loops," inProc. 3rd Workshop Languages Compilers Parallel Computing, Aug. 1989.
[6] R. G. Cytron, "Compiletime scheduling and optimization for multiprocessors," Ph.D. dissertation, Univ. of Illinois at UrbanaChampaign, DCS Rep. UIUCDCSR841177, 1984.
[7] J.M. Delosme and I. C. F. Ipsen, "Efficient systolic arrays for the solution of Toeplitz systems: An illustration of a methodology for the construction of systolic architectures in VLSI," Tech. Rep. 370, Yale Univ. 1985.
[8] J. A. B. Fortes and D. I. Moldovan, "Parallelism detections and transformation techniques useful for VLSI algorithms,"J. Parallel Distributed Comput., vol. 2, pp. 277301, 1985.
[9] K. Gallivan, W. Jalby, U. Meier, and A. Sameh, "The impact of hierarchical memory systems on linear algebra algorithm design," Tech. Rep., Univ. of Illinois, 1987.
[10] D. Gannon, W. Jalby, and K. Gallivan, "Strategies for Cache and Local Memory Management by Global Program Transformation,"J. Parallel and Distributed Computing, Vol. 5, No. 5, Oct. 1988, pp. 587616.
[11] F. Irigoin, "Partitionnement des boucles imbeiquees: Une technique d'optimisation pour les programmes scientifiques," Ph.D. dissertation, UniversitéParisVI, June 1987.
[12] F. Irigoin and R. Triolet, "Computing dependence direction vectors and dependence cones," Tech. Rep. E94, Centre D'Automatique et Informatique, 1988.
[13] F. Irigoin and R. Triolet, "Supernode partitioning," inProc. Fifteenth Annu. ACM. SIGACTSIGPLAN Symp. Principles Programming Languages, Jan. 1988, pp. 319329.
[14] F. Irigoin and R. Triolet, "Dependence approximation and global parallel code generation for nested loops," inParallel Distributed Algorithms, 1989.
[15] M. Lam, "Software Pipelining: An Effective Scheduling Technique for VLIW Machines,"Proc. Sigplan 88 Conf. Programming Language Design and Implementation, ACM, New York, 1988, pp. 318328.
[16] D. E. Maydan, J. L. Hennessy, and M. S. Lam, "Efficient and exact data dependence analysis," inProc. ACM SIGPLAN '91 Conf. Programming Language Design Implementation, June 1991, pp. 114.
[17] A. Porterfield, "Software Methods for Improvement of Cache Performance on Supercomputer Applications," PhD thesis, Dept. of Computer Sci., Rice Univ., 1989.
[18] P. Quinton, "The systematic design of systolic arrays," Tech. Rep. 193, Centre National de la Recherche Scientifique, 1983.
[19] P. Quinton, "Automatic synthesis of systolic arrays from uniform recurrent equations," inProc. 11th Annu. Symp. Comput. Architecture, 1984, pp. 208214.
[20] H.B. Ribas, "Automatic Generation of Systolic Programs from Nested Loops," doctoral dissertation, Carnegie Mellon Univ., Pittsburgh, June 1990.
[21] R. Schreiber and J. Dongarra, "Automatic blocking of nested loops," 1990.
[22] C.W. Tseng and M. J. Wolfe, "The power test for data dependence," Tech. Rep., Rice COMP TR90145, Rice Univ., Dec. 1990.
[23] M. E. Wolf, "Improving parallelism and data locality in nested loops," Ph.D. dissertation, Stanford Univ., 1991, in preparation.
[24] M.E. Wolf, "A Data Locality Optimizing Algorithm,"Proc. ACM Sigplan Conf. Programming Language Design and Implementation, ACM, New York, 1991, pp. 3044.
[25] M. Wolfe, "More iteration space tiling," inProc. Supercomputing '89, 1989, pp. 655664.
[26] M. Wolfe,Optimizing Supercompilers for Supercomputers. Cambridge MA: MIT Press, 1989.