This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
FASTEST: A Practical Low-Complexity Algorithm for Compile-Time Assignment of Parallel Programs to Multiprocessors
February 1999 (vol. 10 no. 2)
pp. 147-159

Abstract—In the area of parallelizing compilers, considerable research has been carried out on data dependency analysis, parallelism extraction, as well as program and data partitioning. However, designing a practical, low complexity scheduling algorithm without sacrificing performance remains a challenging problem. A variety of heuristics have been proposed to generate efficient solutions but they take prohibitively long execution times for moderate size or large problems. In this paper, we propose an algorithm called FASTEST (Fast Assignment and Scheduling of Tasks using an Efficient Search Technique) that has O(e) time complexity, where e is the number of edges in the task graph. The algorithm first generates an initial solution in a short time and then refines it by using a simple but robust random neighborhood search. We have also parallelized the search to further lower the time complexity. We are using the algorithm in a prototype automatic parallelization and scheduling tool which compiles sequential code and generates parallel code optimized with judicious scheduling. The proposed algorithm is evaluated with several application programs and outperforms a number of previous algorithms by generating parallelized code with shorter execution times, while taking dramatically shorter scheduling times. The FASTEST algorithm generates optimal solutions for a majority of the test cases and close-to-optimal solutions for the rest.

[1] I. Ahmad, Y.-K. Kwok, M.-Y. Wu, and W. Shu, "Automatic Parallelization and Scheduling of Programs on Multiprocessors Using CASCH," Proc. 1997 Int'l Conf. Parallel Processing, pp. 288-291, Aug. 1997.
[2] I. Ahmad and Y.-K. Kwok, "Optimal and Near-Optimal Allocation of Precedence-Constrained Tasks to Parallel Processors: Defying the High Complexity Using Effective Search Techniques," Proc. 1998 Int'l Conf. Parallel Processing, pp. 424-431, Aug. 1998.
[3] D. Chen and P. Yew, “On Effective Execution of Nonuniform DOACROSS Loops,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 5, pp. 463-476, May 1996.
[4] M. Cosnard and M. Loi, "Automatic Task Graph Generation Techniques," Parallel Processing Letters, vol. 5, no. 4, pp. 527-538, 1995.
[5] H. El-Rewini, H.H. Ali, and T.G. Lewis, “Task Scheduling in Multiprocessing Systems,” Computer, pp. 27-37, Dec. 1995.
[6] T. Fahringer, "Compile-Time Estimation of Communication Costs for Data Parallel Programs," J. Parallel and Distributed Computing, vol. 39, pp. 46-65, 1996.
[7] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness.New York: W.H. Freeman, 1979.
[8] M. Girkar and C.D. Polychronopoulos, "Automatic Extraction of Functional Parallelism from Ordinary Programs," IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 2, pp. 166-178, Mar. 1992.
[9] M. Gupta and P. Banerjee, "Compile-Time Estimation of Communication Costs on Multicomputers," Proc. Sixth Int'l Parallel Processing Symp., pp. 470-475,Beverly Hills, Calif., Mar. 1992.
[10] J.J. Hwang,Y.C. Chow,F.D. Anger, and C.Y. Lee,"Scheduling precedence graphs in systems with interprocessor communication times," SIAM J. Computing, vol. 18, no. 2, pp. 244-257, Apr. 1989.
[11] H. Kasahara and S. Narita, "Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing," IEEE Trans. Computers, vol. 33, no. 11, pp. 1,023-1,029, Nov. 1984.
[12] K. Kennedy, N. McIntosh, and K.S. Mckinley, "Static Performance Estimation in a Parallelizing Compiler," Technical Report TR 91-174, Dept. of Computer Science, Rice Univ., Dec. 1991.
[13] B.W. Kernighan and S. Lin, "An Effective Heuristic Procedure for Partitioning Graphs," Bell Systems Technical J., vol. 49, pp. 291-308, Feb. 1970.
[14] Y.-K. Kwok and I. Ahmad, “Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 5, pp. 506-521, May 1996.
[15] Y. Kwok and I. Ahmad, “Benchmarking the Task Graph Scheduling Algorithms,” Proc. First Merged Int'l Parallel Pocessing Symp./Symp. Parallel and Distributed Processing Conf., pp. 531-537, 1998.
[16] Y.-K. Kwok and I. Ahmad, “Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors,” ACM Computing Surveys, vol. 31, no. 4, pp. 406-471, Dec. 1999.
[17] B. Lee, A.R. Hurson, and T.Y. Feng, "A Vertically Layered Allocation Scheme for Dataflow Systems," J. Parallel and Distributed Computing, Vol. 11, No. 3, Mar. 1991, pp. 175-187.
[18] Z. Li and P.-C. Yew, "Program Parallelization with Interprocedural Analysis," J. Supercomputing, vol. 2, no. 2, pp. 225-244, Oct. 1988.
[19] Z. Li, P.-C. Yew, and C.-Q. Zhu, "An Efficient Data Dependence Analysis for Parallelizing Compilers," IEEE Trans. Parallel and Distributed Systems, vol. 1, Jan. 1990.
[20] M.A. Palis, J.-C. Liou, and D.S.L. Wei, “Task Clustering and Scheduling for Distributed Memory Parallel Architectures,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 1, pp. 46-55, Jan. 1996.
[21] C.H. Papadimitriu and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity. Prentice Hall, 1987.
[22] V. Sarkar,Partitioning and Scheduling Parallel Programs for Execution on Multiprocessors.Cambridge, Mass.: MIT Press, 1989.
[23] B. Shirazi, M. Wang, and G. Pathak, “Analysis and Evaluation of Heuristic Methods for Static Task Scheduling,” J. Parallel and Distributed Computing, vol. 10, no. 3, pp. 222-232, Nov. 1990.
[24] G.C. Sih and E.A. Lee, “A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architectures,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 2, pp. 175-186, Feb. 1993.
[25] M.Y. Wu and D.D. Gajski,"Hypertool: A programming aid for message-passing systems," IEEE Transactions on Parallel and Distributed Systems, vol. 1, no. 3, pp. 330-343, July 1990.
[26] T. Yang and A. Gerasoulis, “DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors,” IEEE Trans. Parallel and Distributed Systems, vol. 5, pp. 951-967, 1994.

Index Terms:
Automatic parallelization, compile-time scheduling, task graphs, multiprocessors, parallel processing, parallel programming tool, parallel algorithm, random neighborhood search.
Citation:
Yu-Kwong Kwok, Ishfaq Ahmad, "FASTEST: A Practical Low-Complexity Algorithm for Compile-Time Assignment of Parallel Programs to Multiprocessors," IEEE Transactions on Parallel and Distributed Systems, vol. 10, no. 2, pp. 147-159, Feb. 1999, doi:10.1109/71.752781
Usage of this product signifies your acceptance of the Terms of Use.