This Article 
 Bibliographic References 
 Add to: 
Synchronization and Communication Costs of Loop Partitioning on Shared-Memory Multiprocessor Systems
July 1992 (vol. 3 no. 4)
pp. 505-512
The author presents strategies for static loop decomposition and scheduling as well as computer-assisted run-time scheduling that take into account, in addition to the cost of performing operations, the overhead costs associated with a decomposition and schedule. An algorithm for static decomposition of multidimensional loops based on the operation execution costs, communication costs, and synchronization costs is discussed. Synchronization instructions are introduced to ensure correct program execution following program decomposition. An algorithm for determining the explicit synchronization instruction that should be introduced to ensure correct execution of a program with arbitrarily nested loops is presented. Techniques for reducing run-time scheduling and communication and synchronization costs due to self-scheduling of multidimensional loops are also presented. Experiments performed on the Encore multiprocessor system demonstrate that the techniques developed can reduce overhead costs.

[1] J.R. Allen and K. Kennedy, "Automatic loop interchange," inProc. SIGPLAN '84 Symp. Comp. Construct., Montreal, Canada, July 1984.
[2] J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren, "Conversion of control dependence to data dependence,"POPL, Jan. 1983.
[3] R. Allen and K. Kennedy, "Automatic translation of FORTRAN to vector form,"ACM Trans. Programming Languages Syst., vol. 9, no. 4, pp. 491-524, 1987.
[4] Z. Cvetanovic, "The effect of problem partitioning, allocation, and granularity on the performance of multiple-processor systems,"IEEE Trans. Comput., vol. C-36, Apr. 1987.
[5] R. Cytron, "Useful parallelism in a multiprocessing environment," inProc. 1985 Int. Conf. Parallel Processing, Aug. 1985, pp. 450-457.
[6] R. Cytron, "Limited processor scheduling of Doacross loops," inProc. Int. Conf. Parallel Processing, Aug. 1987, pp. 226-234.
[7] C. Kruskal and A. Weiss, "Allocating independence subtasks on parallel processors,"IEEE Trans. Software Eng., vol. SE-11, no. 10, pp. 1001-1015, Oct. 1985.
[8] D. J. Kuck, E. S. Davidson, D. H. Lawrie, and A. H. Sameh, "Parallel supercomputing today and the Cedar approach,"Science, vol. 231, pp. 967-974, Feb. 1986.
[9] S. P. Midkiff and D. A. Padua, "Compiler algorithms for synchronization,"IEEE Trans. Comput., vol. C-36, no. 12, pp. 1485-1495, Dec. 1987.
[10] C. D. Polychronopoulos and U. Banerjee, "Speedup bounds and processor allocation for parallel programs on multiprocessors," inProc. Int. Conf. Parallel Processing, Aug. 1986, pp. 961-968.
[11] C. D. Polychronopoulos, D. J. Kuck, and D. A. Padua, "Execution of parallel loops on parallel processor systems," inProc. Int. Conf. Parallel Processing, Aug. 1986, pp. 235-242.
[12] C. D. Polychronopoulos, "On program restructuring, scheduling, an communication for parallel processor systems," Ph.D. dissertation, CSRD 595, Center of Supercomput. Res. Develop., University of Illinois, Aug. 1986.
[13] C. Polychronopoulos and D. Kuck, "Guided self-scheduling: A practical scheduling scheme for parallel supercomputers,"IEEE Tran. Comput., 1987.
[14] C. D. Polychronopoulos, "Compiler optimizations for enhancing parallelism and their impact on architecture design,"IEEE Trans. Comput., vol. 37, no. 8, pp. 991-1004, Aug. 1988.
[15] C. D. Polychronopoulos, "The impact of run-time overhead on usable parallelism," inProc. 1988 Int. Conf. Parallel Processing, Aug. 1988, pp. 108-112.
[16] C. D. Polychronopoulos, D. J. Kuck, and D. A. Padua, "Utilizing multidimensional loop parallelism on large-scale parallel processor systems,"IEEE Trans. Comput., vol. 38, no. 9, pp. 1285-1296, Sept. 1989.
[17] B. T. Smith, J. M. Boyle, J. J. Dongarra, B. S. Garbow, Y. Ikebe, V. C. Klema, and C. B. Moler, "Matrix eigensystem routines-EISPACK guide,"Lecture Notes in Computer Science. New York: Springer-Verlag, 1976.

Index Terms:
Index Termsstatic loop scheduling; loop partitioning; shared-memory multiprocessor systems; staticloop decomposition; computer-assisted run-time scheduling; multidimensional loops;operation execution costs; communication costs; synchronization costs; programexecution; program decomposition; synchronization instruction; nested loops;self-scheduling; Encore multiprocessor system; parallel algorithms; parallel programming; program compilers; programming theory; scheduling
R. Gupta, "Synchronization and Communication Costs of Loop Partitioning on Shared-Memory Multiprocessor Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 3, no. 4, pp. 505-512, July 1992, doi:10.1109/71.149968
Usage of this product signifies your acceptance of the Terms of Use.