This Article 
 Bibliographic References 
 Add to: 
Scheduling DAG's for Asynchronous Multiprocessor Execution
May 1994 (vol. 5 no. 5)
pp. 498-508

A new approach is given for scheduling a sequential instruction stream for execution "inparallel" on asynchronous multiprocessors. The key idea in our approach is to exploit thefine grained parallelism present in the instruction stream. In this context, schedules areconstructed by a careful balancing of execution and communication costs at the level ofindividual instructions, and their data dependencies. Three methods are used to evaluateour approach. First, several existing methods are extended to the fine grained situation.Our approach is then compared to these methods using both static schedule lengthanalyses, and simulated executions of the scheduled code. In each instance, our methodis found to provide significantly shorter schedules. Second, by varying parameters suchas the speed of the instruction set, and the speed/parallelism in the interconnectionstructure, simulation techniques are used to examine the effects of various architecturalconsiderations on the executions of the schedules. These results show that our approachprovides significant speedups in a wide-range of situations. Third, schedules produced byour approach are executed on a two-processor Data General shared memorymultiprocessor system. These experiments show that there is a strong correlationbetween our simulation results, and these actual executions, and thereby serve tovalidate the simulation studies. Together, our results establish that fine grainedparallelism can be exploited in a substantial manner when scheduling a sequentialinstruction stream for execution "in parallel" on asynchronous multiprocessors.

[1] "Parallel MIMD Computation: HEP Supercomputer&Its Applications,"Scientific Computation Series. Cambridge, MA: MIT Press, 1985.
[2] Installing and Managing the DG/UX System, Data General Corporation, 1990.
[3] A. V. Aho, R. Sethi, and J. D. Ullman,Compilers: Principles, Techniques, and Tools. Reading, MA: Addison-Wesley, 1986.
[4] M. J. Bach,Design of the UNIX Operating System. Englewood Cliffs, NJ: Prentice-Hall, 1986.
[5] Z. Cvetanovic, "The effect of problem partitioning, allocation, and granularity on the performance of multiple-processor systems,"IEEE Trans. Comput., vol. C-36, Apr. 1987.
[6] A. Dinning, "A survey of synchronization methods for parallel computers,"Comput., pp. 66-76, July 1989.
[7] J. J. Dongarra and A. R. Jinds, "Unrolling loops in Fortran,"Software Practice and Experience, pp. 219-226, Mar. 1979.
[8] J. Ellis,Bulldog: A Compiler for VLIW Architectures, MIT Press, Cambridge, MA, 1986, pp. 260-261.
[9] J. Fisher, "Trace scheduling: A technique for global microcode compaction,"IEEE Trans. Comput., vol. C-30, no. 7, July 1981.
[10] M. R. Garey and D. S. Johnson,Computers and Intractability: A Guide to Theory of NP-Completeness. San Francisco, CA: Freeman, 1979.
[11] R. Gupta, "Employing register channels for the exploitation of instruction level parallelism," presented at theSecond ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, Seattle Washington, Mar. 1990.
[12] W.-C. Hsu, C. N. Fischer, and J. R. Goodman, "On the minimization of loads/stores in local register allocation,"IEEE Trans. Software Eng., vol. 15, pp. 1252-1260, Oct. 1989.
[13] H. Kasahara and S. Narita, "Practical multiprocessor scheduling algorithms for efficient parallel processing,"IEEE Trans. Comput., vol. C-33, no. 11, pp. 1023-1029, Nov. 1984.
[14] T. Lang, "Interconnections between processors and memory modules using the shuffle-exchange network,"IEEE Trans. Comput., vol. C-25, no. 5, May 1976.
[15] M. D. MacLaren, "Inline routines in VAXELN Pascal,"SIGPLAN Not., vol. 19, no. 6, pp. 266-275, 1984.
[16] B. Malloy, E. L. Lloyd and M. L. Soffa, "Fine grained scheduling of asynchronous multiprocessors in NP-complete," Tech. Rep. # 89-23, Dec. 1989.
[17] B. Malloy and M. L. Soffa, "Conversion of simulation processes to Pascal constructs,"Software-Practice and Experience, vol. 20, no. 2, pp. 191-207, Feb. 1990.
[18] F. H. McMohan, "FORTRAN CPU performance analysis," Lawrence Livermore Laboratories, 1972.
[19] C. H. Papadimitriou and J. D. Ullman, "A communication-time tradeoff,"SIAM J. Comput., vol. 16, no. 4, pp. 639-646, Aug. 1987.
[20] T. L. Rodeheffer, "Compiling ordinary programs for executing on an asynchronous multiprocessor," Tech. Rep. No. CMU-CS-85-155, Carnegie Mellon Univ., 1985.
[21] V. Sarkar and J. Hennessy, "Compile-time partitioning and scheduling of parallel programs," inProc. SIGPLAN Symp. Compiler Construction, July 1986, pp. 17-26.
[22] V. Sarkar, "Partitioning and scheduling parallel programs for execution on multiprocessors," Tech. Rep. no. CSL-TR-87-328, Standford Univ., Apr. 1987.
[23] V. Sarkar, Private Communication, Dec. 8, 1989.
[24] G. S. Tjaden and M. J. Flynn, "Detection and parallel execution of independent instructions,"IEEE Trans. Comput., vol. 19, no. 10, pp. 889-895, Oct. 1970.
[25] D. Vrsalovic, D. Seiwiorek, Z. Segall, and E. Gehringer, "Performance prediction and calibration for a class of multiprocessors,"IEEE Trans. Comput., vol. 37, no. 11, Nov. 1988.
[26] S. Weiss and J.E. Smith, "A Study of Scalar Compilation Techniques for Pipelined Supercomputers,"Proc. Second Int'l Conf. Architectural Support for Programming Languages and Operating Systems(ASPLOS-II), CS Press, Los Alamitos, Calif., Order No. 805, 1987, pp. 105-109.
[27] A. Wolfe and J. Shen, "A variable instruction stream extension to the VLIW architecture," inForth Int. Conf. Architectural Support for Programming Languages and Operating Syst., Apr. 1991, pp. 2-14.

Index Terms:
Index Termsscheduling; shared memory systems; parallel programming; multiprocessing programs;instruction sets; DAG; asynchronous multiprocessor execution; sequential instructionstream; scheduling; fine grained parallelism; execution costs; communication costs; datadependencies; Data General shared memory multiprocessor system; concurrency;parallelism
B.A. Malloy, E.L. Lloyd, M.L. Soffa, "Scheduling DAG's for Asynchronous Multiprocessor Execution," IEEE Transactions on Parallel and Distributed Systems, vol. 5, no. 5, pp. 498-508, May 1994, doi:10.1109/71.282560
Usage of this product signifies your acceptance of the Terms of Use.