This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Efficient Processor Assignment Algorithms and Loop Transformations for Executing Nested Parallel Loops on Multiprocessors
January 1992 (vol. 3 no. 1)
pp. 71-82
An important issue for the efficient use of multiprocessor systems is the assignment of parallel processors to nested parallel loops. It is desirable for a processor assignment algorithm to be fast and always generate an optimal processor assignment. The paper proposes two efficient algorithms to decide the optimal number of processors assigned to each individual loop. Efficient parallel counterparts of these two algorithms are also presented. These algorithms not only always generate an optimal processor assignment, but also are much faster than the exiting optimal algorithm in the literature. The paper discusses improving the performance of parallel execution by transforming a nested parallel loop into a semantically equivalent one. Three loop transformations are investigated. It is observed that, in most cases, the parallel execution time is improved after applying these transformations.

[1] J. R. Allen and K. Kennedy, "PFC: a program to convert Fortran to parallel form," inTutorial of Supercomputers: Design and Application, K. Hwang, Ed. New York: IEEE, 1984, pp. 186-203.
[2] R. G. Cytron, "Compile-time scheduling and optimizations for multiprocessor systems," Ph.D. dissertation, Dep. Comput. Sci., Univ. of Illinois, Sept. 1984.
[3] R. G. Cytron, "Doacross: Beyond vectorization for multiprocessors," inProc. 1986 Int. Conf. Parallel Processing, 1986, pp. 836-844.
[4] R. G. Cytron, "Limited processor scheduling of doacross loops," inProc. 1987 Int. Conf. Parallel Processing, 1987, pp. 226-234.
[5] L.R. Morris,Digital Signal Processing Software, DSPS Inc., Ottawa, Canada, 1982.
[6] Z. Fang, P. C. Yew, P. Tang, and C. Q. Zhu, "Dynamic processor self-scheduling for general parallel nested loops," inProc. 1987 Int. Conf. Parallel Processing, 1987, pp. 1-10.
[7] J. R. Ellis, J. A. Fisher, J. C. Ruttenberg, and A. Nicolau, "Parallel processing: A smart compiler and a dumb machine," inProc. SIGPLAN 84 Symp. Compiler Construction, ACM SIGPLAN Notices, June 1984.
[8] R.L. Graham, E. L. Lawler, J. K. Lenstra, Rinnooy Kan, and A.H.G., "Optimization and approximation in deterministic sequencing and scheduling: A survey," inAnn. Discrete Math.Amsterdam, The Netherlands: North-Holland, 1979, pp. 287-326.
[9] Hockney, R. W., and C. R. Jesshope. 1988.Parallel Computers 2: Architectures, Programming and Algorithms. Bristol, Adam Hilger.
[10] K. Hwang and F. A. Briggs,Computer Architecture and Parallel Processing. New York: McGraw-Hill, 1984.
[11] D. J. Kuck, R.H. Kuhn, B. Leasure, and M. Wolfe, "The structure of an advanced vectorizer for pipelined processors," inProc. Fourth Int. Comput. Software Appl. Conf., Oct. 1980, pp. 709-715.
[12] D. J. Kucket al., "The effects of program restructuring, algorithm change and architecture choice on program performance," inProc. 1984 Int. Conf. Parallel Processing, Aug. 1984, pp. 129-138.
[13] R. Manner, "Hardware task/processor scheduling in a polyprocessor environment,"IEEE Trans. Comput., vol. C-33, pp. 626-636, July 1984.
[14] D. A. Padua, "Multiprocessors: Discussion of theoretical and practical problems," Ph.D. dissertation, Univ. of Illinois at Urbana-Champaign, Rep. UIUCDCS-R-79-990, Nov. 1979.
[15] C. D. Polychronopoulos, D. J. Kuck, and D.A. Padua, "Execution of parallel loops on parallel processor systems," inProc. 1986 Int. Conf. Parallel Processing, 1986, pp. 519-535.
[16] C. D. Polychronopoulos and U. Banerjee, "Processor allocation for horizontal and vertical parallelism and related speedup bounds,"IEEE Trans. Comput., vol. C-36, pp. 410-420, Apr. 1987.
[17] C. Polychronopoulos and D. Kuck, "Guided self-scheduling: A practical scheduling scheme for parallel supercomputers,"IEEE Tran. Comput., 1987.
[18] C. D. Polychronopoulos, D. J. Kuck, and D.A. Padua, "Utilizing multi-dimensional loop parallelism on large-scale parallel processor systems,"IEEE Trans. Comput., vol. 38. pp. 1285-1296, Sept. 1989.
[19] M. Quinn,Designing Efficient Algorithms for Parallel Computers. New York: McGraw-Hill, 1987.
[20] H.S. Stone, "Multiprocessor scheduling with the aid of network flow algorithms,"IEEE Trans. Software Eng., vol. SE-3, pp. 85-93, Jan. 1977.
[21] P. Tang and P. C. Yew, "Processor self-scheduling for multiple-nested parallel loops," inProc. 1986 Int. Conf. Parallel Processing, Aug. 1984, pp. 528-535.
[22] A. V. Veidenbaum, "Compiler optimizations and architecture design issues for multiprocessors," Ph.D. dissertation, Center for Supercomput. Res. Develop. Rep. 520, Univ. of Illinois, Urbana, 1985.
[23] Y. Wu and T. Lewis, "Parallel processor balance through loop spreading," inProc. 1989 ACM Int. Conf. Supercomput., 1989, pp. 665-674.

Index Terms:
Index Termsprocessor assignment algorithms; loop transformations; nested parallel loops;multiprocessors; parallel processors; performance; parallel execution; parallel algorithms;parallel programming; program compilers
Citation:
C.M. Wang, S.D. Wang, "Efficient Processor Assignment Algorithms and Loop Transformations for Executing Nested Parallel Loops on Multiprocessors," IEEE Transactions on Parallel and Distributed Systems, vol. 3, no. 1, pp. 71-82, Jan. 1992, doi:10.1109/71.113083
Usage of this product signifies your acceptance of the Terms of Use.