This Article 
 Bibliographic References 
 Add to: 
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers
September 1994 (vol. 5 no. 9)
pp. 924-938

In distributed memory multicomputers, local memory accesses are much faster than thoseinvolving interprocessor communication. For the sake of reducing or even eliminating theinterprocessor communication, the array elements in programs must be carefullydistributed to local memory of processors for parallel execution. We devote our efforts tothe techniques of allocating array elements of nested loops onto multicomputers in acommunication-free fashion for parallelizing compilers. We first analyze the pattern ofreferences among all arrays referenced by a nested loop, and then partition the iterationspace into blocks without interblock communication. The arrays can be partitioned underthe communication-free criteria with nonduplicate or duplicate data. Finally, a heuristicmethod for mapping the partitioned array elements and iterations onto the fixed-sizemulticomputers under the consideration of load balancing is proposed. Based on thesemethods, the nested loops can execute without any communication overhead on thedistributed memory multicomputers. Moreover, the performance of the strategies withnonduplicate and duplicate data for matrix multiplication is studied.

[1] U. Banerjee, "Unimodular transformations of double loops," in3rd Workshop on Languages and Compilers for Parallel Computing, 1990, pp. 192-219.
[2] D. Callahan and K. Kennedy, "Compiling programs for distributed-memory multiprocessors,"J. Supercomputing, vol. 2, pp. 151-169, Oct. 1988.
[3] S. H. Friedberg, A. J. Insel, and L. E. Spence,Linear Algebra. Englewood Cliffs, NJ: Prentice-Hall, 1979.
[4] K. Gallivan, W. Jalby, and D. Gannon, "On the problem of optimizing data transfers for complex memory systems," inProc. 1988 ACM Int. Conf. Supercomput., St. Malo France, July 1988, pp. 238,253.
[5] D. Gannon, W. Jalby, and K. Gallivan, "Strategies for Cache and Local Memory Management by Global Program Transformation,"J. Parallel and Distributed Computing, Vol. 5, No. 5, Oct. 1988, pp. 587-616.
[6] M. Gupta and P. Banerjee, "Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputers,"IEEE Trans. Parallel Distrib. Syst., vol. 3, pp. 179-193, Mar. 1992.
[7] D. Hudak and S. Abraham, "Compiler techniques for data partitioning of sequentially iterated parallel loops," inProc. ACM Int. Conf. Supercomput., June 1990, pp. 187-200.
[8] F. Irigoin and R. Triolet, "Supernode partitioning," inProc. Fifteenth Annu. ACM. SIGACT-SIGPLAN Symp. Principles Programming Languages, Jan. 1988, pp. 319-329.
[9] C. T. King, W. H. Chou, and L. M. Ni, "Pipelined data-parallel algorithms--part II: Design,"IEEE Trans. Parallel Distrib. Syst., vol. 1, pp. 486-499, Oct. 1990.
[10] C. Koelbel, P. Mehrotra, and J. Von Rosendale, "Semi-automatic process partitioning for parallel computation,"Int. J. Parallel Programming, vol. 16, no. 5, pp. 365-382, 1987.
[11] C. Koelbel and P. Mehrotra, "Compiling global name-space parallel loops for distributed execution,"IEEE Trans. Parallel Distrib. Syst., vol. 2, pp. 440-451, Oct. 1991.
[12] L. Lamport, "The parallel execution of DO loops,"Commun. ACM, vol. 17, no. 2, pp. 83-93, Feb. 1974.
[13] L. S. Liu, C. W. Ho, and J. P. Sheu, "On the parallelism of nested for-loops using index shift method,"Proc. 1990 Int. Conf. Parallel Processing, vol. II, 1990, pp. 119-123.
[14] M. Lu and J. Z. Fang, "A solution of the cache ping-pong problem in multiprocessor systems,"J. Parallel Distrib. Computing, 1992, pp. 158-171.
[15] D. A. Padua, D. J. Kuck, and D. H. Lawrie, "High-speed multiprocessors and compilation techniques,"IEEE Trans. Comput., vol. C-29, no. 9, pp. 763-776, Sept. 1980.
[16] D. A. Padua and M. J. Wolfe, "Advanced compiler optimizations for supercomputers,"Common. ACM, vol. 29, no. 12, pp. 1184- 1201, Dec. 1986.
[17] A. Ramanujan and P. Sadayappan, "A methodology for parallelizing programs for complex memory multiprocessors," inProc. Supercomputing 89, Reno, NV, Nov. 1989.
[18] J. Ramanujam and P. Sadayappan, "Compile-time techniques for data distribution in distributed memory machines,"IEEE Trans. Parallel Distrib. Syst., vol. 2, pp. 472-482, Oct. 1991.
[19] A. Rogers and K. Pingali, "Process decomposition through locality of reference," inProc. SIGPLAN'89 Conf. Programming Language Design and Implementation, 1989, pp. 69-80.
[20] J. P. Sheu and T. H. Tai, "Partitioning and mapping nested loops on multiprocessor systems,"IEEE Trans. Parallel Distrib. Syst., vol. 2, pp. 430-439, Oct. 1991.
[21] M.E. Wolf, "A Data Locality Optimizing Algorithm,"Proc. ACM Sigplan Conf. Programming Language Design and Implementation, ACM, New York, 1991, pp. 30-44.
[22] M. E. Wolf and M. S. Lam, "A loop transformation theory and an algorithm to maximize parallelism,"IEEE Trans. Parallel Distrib. Syst., vol. 2, pp. 452-471, Oct. 1991.
[23] M. Wolfe,Optimizing Supercompilers for Supercomputers. Cambridge MA: MIT Press, 1989.
[24] M. Wolfe, "More iteration space tiling," inProc. Supercomputing '89, 1989, pp. 655-664.

Index Terms:
Index Termsdistributed memory systems; parallel programming; program compilers; storage allocation; communication-free data allocation techniques; parallelizing compilers; multicomputers; distributed memory multicomputers; local memory accesses; interprocessor communication; array elements; parallel execution; nested loops; nested loop; iteration space; interblock communication; communication-free criteria; duplicate data; heuristic method; partitioned array elements; fixed-size multicomputers; load balancing; communication overhead; matrix multiplication
T.S. Chen, J.P. Sheu, "Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers," IEEE Transactions on Parallel and Distributed Systems, vol. 5, no. 9, pp. 924-938, Sept. 1994, doi:10.1109/71.308531
Usage of this product signifies your acceptance of the Terms of Use.