This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Partitioning and Labeling of Loops by Unimodular Transformations
July 1992 (vol. 3 no. 4)
pp. 465-476
A general method for the identification of the independent subsets in loops with constant dependence vectors is presented. It is shown that the dependence relation remains invariant under a unimodular transformation. Then a unimodular transformation is used to bring the dependence matrix into a form where the independent subsets are obtained bya direct and inexpensive partitioning algorithm. This leads to a procedure for the automatic conversion of a serial loop into a nest of parallel DO-ALL loops. Another unimodular transformation results in an algorithm to label the dependent iterations of an n-fold nested loop in O(n/sup 2/) time. This provides a multithreaded dynamic scheduling scheme requiring only one fork and one join primitive.

[1] W. Aba-Sufah, D. J. Kuck, and D. H. Lawrie, "On the performance enhancement of paging systems through program analysis and transformations,"IEEE Trans. Comput., vol. 30, no. 5, pp. 341-356, 1981.
[2] R. Allen and K. Kennedy, "Automatic translation of FORTRAN to vector form,"ACM Trans. Programming Languages Syst., vol. 9, no. 4, pp. 491-524, 1987.
[3] G. H. Bradley, "Algorithm and bound for the greatest common divisor of n integers,"Commun. ACM, vol. 13, no. 7, pp. 433-436, 1970.
[4] R. Cytron, "Limited processor scheduling of doacross loops," inProc. Int. Conf. Parallel Processing '87, 1987, pp. 226-234.
[5] R. Cytron, "Doacross: Beyond vectorization for multiprocessors," inProc. Int. Conf. Parallel Processing '86, 1986, pp. 836-844.
[6] R. G. Cytron, "Compile-time scheduling and optimization for multiprocessors," Ph.D. dissertation, Univ. of Illinois at Urbana-Champaign, DCS Rep. UIUCDCS-R-84-1177, 1984.
[7] E. H. D'Hollander, "Partitioning and labeling of index sets in DO loops with constant dependence vectors," inProc. Int. Conf. Parallel Processing '89, II--Software, 1989, pp. 139-144.
[8] P. D. Domich, R. Kannan, and Trotter, Jr., "Hermite normal form computation using modula determinant arithmetic," Core discussion paper Nr. 8507, Center for Operations Research and Econometrics, Catholique University of Louvain, 1985.
[9] J. Edmonds, "Systems of distinct representatives and linear algebra,"J. Res. of the National Bureau of Standards, vol. 71B, pp. 241-245, 1967.
[10] Z. Fang, P. Tang, P-C. Yew, and C-Q. Zhu, "Dynamic processor self-scheduling for general parallel nested loops,"IEEE Trans. Comput., vol. 39, no. 7, pp. 919-929, 1990.
[11] J. A. B. Fortes and D. I. Moldovan, "Parallelism detection and transformation techniques useful for VLSI algorithms,"J. Parallel Distributed Comput., vol. 2, pp. 227-301, 1985.
[12] R. Kannan and A. Bachem, "Polynomial algorithms for computing the Smith and Hermite normal forms of an integer matrix,"SIAM J. Comput., vol. 8, no. 4, pp. 499-507, 1979.
[13] A. Kauffmann and A. Henry-Labord'ere,Integer and Mixed Programming, Theory and Applications. New York: Academic, 1979, 379 pp.
[14] D. J. Kuck, A. H. Sameh, R. Cytron, A. V. Veidenbaumet al., "The effect of program restructuring, algorithm changes, and architecture choice on program performance," inProc. Int. Conf. Parallel Processing '84, 1984, pp. 129-138.
[15] D. J. Kuck, R.H. Kuhn, B. Leasure, D.A. Padua, and M. Wolfe, "Compiler transformation of dependence graphs," inConf. Rec. 8th ACM Symp. Principles Program. Languages, Williamsburg, VA, Jan. 1981.
[16] D. J. Kuck,The Structure of Computers and Computations, vol. 1. New York: Wiley, 1978.
[17] Z. Li and W. Aba-Sufah, "On reducing data synchronization in multiprocessed loops,"IEEE Trans. Comput., vol. 36, no. 1, pp. 105-109, 1987.
[18] D. B. Loveman, "Program improvement by source to source transformation,"JACM, vol. 24, no. 1, pp. 121-145, Jan. 1977.
[19] S. P. Midkiff and D. A. Padua, "Compiler algorithms for synchronization,"IEEE Trans. Comput., vol. C-36, no. 12, pp. 1485-1495, Dec. 1987.
[20] A. Nicolau, "Loop quantization: A generalized loop unwinding technique,"J. Parallel Distributed Comput., vol. 5, no. 10, pp. 568-586, 1988.
[21] A. Nicolau, "Percolation scheduling: A parallel compilation technique," inProc. Int. Conf. Parallel Processing '85, 1985.
[22] D. A. Padua and M. J. Wolfe, "Advanced compiler optimizations for supercomputers,"Common. ACM, vol. 29, no. 12, pp. 1184- 1201, Dec. 1986.
[23] D. A. Padua, "Multiprocessors: Discussion of theoretical and practical problems," Ph.D. dissertation, Univ. of Illinois at Urbana-Champaign, Rep. UIUCDCS-R-79-990, Nov. 1979.
[24] J-K. Peir and R. Cytron, "Minimum distance, A method for partitioning recurrences for multiprocessors,"IEEE Trans. Comput., vol. 38, no. 8, pp. 1203-1211, 1989.
[25] C. D. Polychronopoulos, "Compiler optimizations for enhancing parallelism and their impact on architecture design,"IEEE Trans. Comput., vol. 37, no. 8, pp. 991-1004, 1988.
[26] C. D. Polychronopoulos and J. R. Beckman Davies, "Compiler and hardware issues for fast synchronization in parallel computers" (in preparation), Tech. Rep., Univ. Illinois at Urbana-Champaign, 1988.
[27] C. D. Polychronopoulos, "Loop coalescing: A compiler transformation for parallel machines," inProc. Int. Conf. Parallel Processing '87, 1987, pp. 235-242.
[28] C. Polychronopoulos and D. Kuck, "Guided self-scheduling: A practical scheduling scheme for parallel supercomputers,"IEEE Tran. Comput., 1987.
[29] C. D. Polychronopoulos, D. J. Kuck, and D. A. Padua, "Execution of parallel loops on parallel processor systems," inProc. Int. Conf. Parallel Processing '86, 1986, pp. 519-527.
[30] A. Schrijver,Theory of Linear and Integer Programming. New York: Wiley, 1986.
[31] W. Shang and J. A. B. Fortes, "Independent partitioning of algorithms with uniform dependencies," inProc. Int. Conf. Parallel Processing '88, II--Software, 1988, pp. 26-33.
[32] W. Shang and J. A. B. Fortes, "Partitioning of uniform dependency algorithms for parallel execution on MIMD/systolic systems," Tech. Rep., TR-EE 88-18, Apr. 1988, 34 pp.
[33] Z. Shen, Z. Li, and P.-C. Yew, "An empirical study of Fortran programs for parallelizing compilers,"IEEE Trans. Parallel Distributed Syst., vol. 1, no. 3, pp. 356-376, July 1990.
[34] B. J. Smith, "Architecture and applications of the HEP multiprocessor computer system,"Real Time Processing IV, Proc. SPIE, pp. 241-248, 1981.
[35] P. Tang and P-C. Yew, "Processor self-scheduling for multiple-nested parallel loops," inProc. Int. Conf. Parallel Processing '86, 1986, pp. 528-535.
[36] M. J. Wolfe, "Loop skewing: The wavefront method revisited," Tech. Rep., University of Illinois, Urbana-Champaign, Center of Supercomputing Research and Development, 1987.
[37] M. J. Wolfe, "Optimizing supercompilers for supercomputers," Ph.D. thesis, Ctr. Supercomput. Res. and Development, Univ. Illinois, Urbana-Champaign, 1980.
[38] C. Zhu and P. Yew, "A scheme to enforce data dependences on large multiprocessor systems,"IEEE Trans. Software Eng., vol. SE-13, no. 6, pp. 726-739, June 1987.

Index Terms:
Index Termsloop partitioning; loop labelling; invariant dependence relation; labelling algorithm;unimodular transformations; independent subsets; constant dependence vectors;unimodular transformation; dependence matrix; partitioning algorithm; serial loop; parallel DO-ALL loops; dependent iterations; n-fold nested loop; multithreaded dynamicscheduling; join primitive; computational complexity; parallel algorithms; parallelprogramming; program compilers; programming theory; scheduling
Citation:
E.H. D'Hollander, "Partitioning and Labeling of Loops by Unimodular Transformations," IEEE Transactions on Parallel and Distributed Systems, vol. 3, no. 4, pp. 465-476, July 1992, doi:10.1109/71.149964
Usage of this product signifies your acceptance of the Terms of Use.