This Article 
 Bibliographic References 
 Add to: 
Partitioning and Mapping Nested Loops on Multiprocessor Systems
October 1991 (vol. 2 no. 4)
pp. 430-439

A method for executing nested loops with constant loop-carried dependencies in parallelon message-passing multiprocessor systems to reduce communication overhead is presented. In the partitioning phase, the nested loop is divided into blocks that reduce the interblock communication, without regard to the machine topology. The execution ordering of the iterations is defined by a given time function based on L. Lamport's (1974) hyperplane method. The iterations are then partitioned into blocks so that the execution ordering is not disturbed, and the amount of interblock communication is minimized. In the mapping phase, the partitioned blocks are mapped onto a fixed-size multiprocessor system in such a manner that the blocks that have to exchange data frequently are allocated to the same processor or neighboring processors. A heuristic mapping algorithm for hypercube machines is proposed.

[1] W. C. Athas and C. L. Seitz, "Multicomputers: Message-passing concurrent computers,"IEEE Comput. Mag., pp. 9-24, Aug. 1988.
[2] U. Banerjee, S. C. Chen, D. J. Kuck, and R. A. Towle, "Time and parallel processor bounds for Fortran-like loops,"IEEE Trans. Comput., vol. C-28, pp. 660-670, Sept. 1979.
[3] M. Chen, "A design methodology for synthesizing parallel algorithms and architectures,"J. Parallel Distributed Comput., pp. 461-491, Dec. 1986.
[4] M. C. Chen, "The generation of a class of multipliers: Synthesizing highly parallel algorithms in VLSI,"IEEE Trans. Comput., vol. 37, pp. 329-338, Mar. 1988.
[5] E. H. D'Hollander, "Partitioning and labeling of index sets in do loops with constant dependence," inProc. 1989 Int. Conf. Parallel Processing, vol. II, 1989, pp. 139-144.
[6] R.C. Gonzalez and P. Wintz,Digital Image Processing, Addison-Wesley, Reading, Mass., 1987.
[7] R. Gupta, "Synchronization and communication costs of loop partitioning on shared-memory multiprocessor systems," inProc. 1989 Int. Conf. Parallel Processing, vol. II, Aug. 1989, pp. 23-30.
[8] C. T. King and L. M. Ni, "Pipelined data-parallel algorithms: Part II--design,"IEEE Trans. Parallel Distributed Syst., vol. 1, pp. 486-499, Oct. 1990.
[9] L. Lamport, "The parallel execution of DO loops,"Commun. ACM, vol. 17, no. 2, pp. 83-93, Feb. 1974.
[10] S. Lang,Linear Algebra. Reading, MA: Addison-Wesley, 1986.
[11] P.-Z. Lee and Z. M. Kedem, "Synthesizing linear array algorithms from nested for loop algorithms,"IEEE Trans. Comput., vol. C-37, pp. 1578-1598, Dec. 1988.
[12] G. J. Li and B. W. Wah, "The design of optimal systolic arrays,"IEEE Trans. Comput., vol. C-34, pp. 66-77, Jan. 1985.
[13] L. S. Liu, C. W. Ho, and J. P. Sheu, "On the parallelism of nested for-loops using index shift method," inProc. 1990 Int. Conf. Parallel Processing, vol. II, PA, Aug. 1990, pp. 119-123.
[14] W. L. Miranker and A. Winkler, "Spacetime representations of computational structures,"Computing, vol. 32, pp. 93-114, 1984.
[15] D. I. Moldovan and J. A. B. Fortes, "Partitioning and mapping algorithms into fixed size systolic arrays,"IEEE Trans. Comput., vol. C-35, pp. 1-12, Jan. 1986.
[16] D. A. Padua, "Multiprocessors: Discussion of theoretical and practical problems," Ph.D. dissertation, Univ. of Illinois at Urbana-Champaign, Rep. UIUCDCS-R-79-990, Nov. 1979.
[17] D. A. Padua, D. J. Kuck, and D. H. Lawrie, "High-speed multiprocessors and compilation techniques,"IEEE Trans. Comput., vol. C-29, pp. 763-776, Sept. 1980.
[18] J.-K. Peir and R. Cytron, "Minimum distance: A method for partitioning recurrences for multiprocessors,"IEEE Trans. Comput., vol. 38, pp. 1203-1211, Aug. 1989.
[19] P. Sadayappan and F. Ercal, "Nearest-neighbor mappings of finite element graphs onto processor meshes,"IEEE Trans. Comput., vol. C-36, pp. 1408-1424, Dec. 1987.
[20] W. Shang and J. A. B. Fortes, "Independent partitioning of algorithms with uniform dependencies," inProc. 1988 Int. Conf. Parallel Processing, 1988, pp. 26-33.

Index Terms:
Index Termsdata exchange; mapping; nested loops; constant loop-carried dependencies; parallel;message-passing multiprocessor systems; communication overhead; partitioning; blocks;interblock communication; execution ordering; iterations; time function; hyperplanemethod; fixed-size multiprocessor system; heuristic mapping algorithm; hypercubemachines; multiprogramming; parallel algorithms; parallel programming
J.P. Sheu, T.H. Thai, "Partitioning and Mapping Nested Loops on Multiprocessor Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 4, pp. 430-439, Oct. 1991, doi:10.1109/71.97900
Usage of this product signifies your acceptance of the Terms of Use.