This Article 
 Bibliographic References 
 Add to: 
Processor Allocation and Task Scheduling of Matrix Chain Products on Parallel Systems
April 2003 (vol. 14 no. 4)
pp. 394-407
Heejo Lee, IEEE
Jong Kim, IEEE
Sunggu Lee, IEEE

Abstract—The problem of finding an optimal product sequence for sequential multiplication of a chain of matrices (the matrix chain ordering problem, MCOP) is well-known and has been studied for a long time. In this paper, we consider the problem of finding an optimal product schedule for evaluating a chain of matrix products on a parallel computer (the matrix chain scheduling problem, MCSP). The difference between the MCSP and the MCOP is that the MCOP pertains to a product sequence for single processor systems and the MCSP pertains to a sequence of concurrent matrix products for parallel systems. The approach of parallelizing each matrix product after finding an optimal product sequence for single processor systems does not always guarantee the minimum evaluation time on parallel systems since each parallelized matrix product may use processors inefficiently. We introduce a new processor scheduling algorithm for the MCSP which reduces the evaluation time of a chain of matrix products on a parallel computer, even at the expense of a slight increase in the total number of operations. Given a chain of n matrices and a matrix product utilizing at most P/k processors in a P-processor system, the proposed algorithm approaches k(n - 1) / ( n + k log(k)-k) times the performance of parallel evaluation using the optimal sequence found for the MCOP. Also, experiments performed on a Fujitsu AP1000 multicomputer show that the proposed algorithm significantly decreases the time required to evaluate a chain of matrix products in parallel systems.

[1] E. Horowitz and S. Sahni, Fundamentals of Data Structures. Potomac, Md.: Computer Science Press, 1976.
[2] E. Dekel, D. Nassimi, and S. Sahni, “Parallel Matrix and Graph Algorithms,” SIAM J. Computing, vol. 10, pp. 657-675, Nov. 1981.
[3] S.-T. Yau and Y.Y. Lu, “Reducing the Symmetric Matrix Eigenvalue Problem to Matrix Multiplications,” SIAM J. Scientific Computing, vol. 14, no. 1, pp. 121-136, 1993.
[4] S.-S. Lin, “A Chained-Matrices Approach for Parallel Computation of Continued Fractions and Its Applications,” J. Scientific Computing, vol. 9, no. 1, pp. 65-80, 1994.
[5] V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin Cummings, 1994.
[6] A. Chandra, “Computing Matrix Chain Products in Near-Optimal Time,” technical report, IBM T.J. Watson Research Center, Yorktown Heights, N.Y., IBM Research Report RC 5625(#24393), 1975.
[7] H. Gould, Bell and Catalan Numbers. Research Inst., Morgantown, W.Va., June 1977.
[8] S. Godbole, “An Efficient Computation of Matrix Chain Products,” IEEE Trans. Computers, pp. 864-866, vol. 22, no. 9, Sept. 1973.
[9] F. Chin, “An$\big. O(n)\bigr.$Algorithm for Determining a Near-Optimal Computation Order of Matrix Chain Product,” Comm. ACM, pp. 544-549, 1978.
[10] T. Hu and M. Shing, “Computation of Matrix Chain Products. Part I,” SIAM J. Computing, vol. 11, pp. 362-373, May 1982.
[11] T. Hu and M. Shing, “Computation of Matrix Chain Products. Part II,” SIAM J. Computing, vol. 13, pp. 228-251, May 1984.
[12] P. Ramanan, “A New Lower Bound Technique and Its Application: Tight Lower Bound for a Polygon Triangulation Problem,” SIAM J. Computing, vol. 23, pp. 834-851, Aug. 1994.
[13] L. Valiant, S. Skyum, S. Berkowitz, and C. Rackoff, “Fast Parallel Computation of Polynomials Using Few Processors,” SIAM J. Computing, vol. 12, pp. 641-644, 1983.
[14] W. Rytter, “Note on Efficient Parallel Computations for Some Dynamic Programming Problems,” Theoretical Computer Science, vol. 59, pp. 297-307, 1988.
[15] S.-H.S. Huang, H. Liu, and V. Viswanathan, “Parallel Dynamic Programming,” IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 3, pp. 326-328, Mar. 1994.
[16] P.G. Bradford, G.J. Rawlins, and G.E. Shannon, “Efficient Matrix Chain Ordering in Polylog Time,” SIAM J. Computing, vol. 27, no. 2, pp. 466-490, 1998.
[17] A. Czumaj, “Parallel Algorithm for the Matrix Chain Product and the Optimal Triangulation Problems,” Proc. Symp. Theoretical Aspects of Computer Science, pp. 294-305, 1993.
[18] A. Czumaj, “Very Fast Approximation of the Matrix Chain Product Problem,” J. Algorithms, vol. 21, no. 1, pp. 71-79, 1996.
[19] P. Ramanan, “An Efficient Parallel Algorithm for the Matrix Chain Product Problem,” SIAM J. Computing, vol. 25, pp. 874-893, Aug. 1996.
[20] V. Strassen, “Gaussian Elimination Is Not Optimal,” Numerical Math., vol. 13, pp. 354-356, 1969.
[21] G.H. Golub and C.F.V. Loan, Matrix Computations. second ed., Baltimore: Johns Hopkins Univ. Press, 1989.
[22] N.-K. Tsao, “Error Complexity Analysis of Algorithms for Matrix Multiplication and Matrix Chain Product,” IEEE Trans. Computers, vol. 30, no. 10, pp. 758-771, Oct. 1981.
[23] C. Puglisi, “Parallel Algorithms and Architectures for Matrix Multiplication,” Computer Math. Applications, vol. 17, no. 12, pp. 1567-1572, 1989.
[24] A. Gupta and V. Kumar, “Scalability of Parallel Algorithms for Matrix Multiplication,” Proc. Int'l Conf. Parallel Processing, pp. 115-123, 1993.
[25] R. Krishnamoorthy and E. Ma, "An Approximate Algorithm for Scheduling Tasks on Varying Partition Sizes in Partitionable Multiprocessor Systems," IEEE Trans. Computers, vol. 41, no. 12, pp. 1,572-1,579, Dec. 1992.
[26] S.G. Akl, The Design and Analysis of Parallel Algorithms. Orlando, Fl.: Academic Press, 1989.
[27] V. Sarkar,Partitioning and Scheduling Parallel Programs for Execution on Multiprocessors.Cambridge, Mass.: MIT Press, 1989.
[28] C.D. Polychronopoulos and U. Banerjee, “Speedup Bounds and Processor Allocation for Parallel Programs on Multiprocessors,” Proc. Int'l Conf. Parallel Processing, pp. 961-968, 1986.
[29] C.D. Polychronopoulus and U. Banerjee, "Processor Allocation for Horizontal and Vertical Parallelism and Related Speedup Bounds," IEEE Trans. Computers, vol. 36, no. 4, pp. 410-420, Apr. 1987.
[30] A. Schoor, “Fast Algorithm for Sparse Matrix Multiplication,” Information Processing Letters, vol. 15, no. 2, pp. 87-89, 1982.
[31] J. Takche, “Complexities of Special Matrix Multiplication Problems,” Computer Math. Applications, vol. 15, no. 12, pp. 977-989, 1988.
[32] J.W.H. Liu, “The Role of Elimination Trees in Sparse Factorization,” SIAM J. Matrix Analysis and Applications, vol. 11, pp. 134-172, Jan. 1990.
[33] J.W.H. Liu, “Equivalent Sparse Matrix Reordering by Elimination Tree Rotations,” SIAM J. Scientific and Statistical Computing, vol. 9, pp. 424-444, May 1988.

Index Terms:
Matrix chain product, parallel matrix multiplication, matrix chain scheduling problem, processor allocation, task scheduling.
Heejo Lee, Jong Kim, Sung Je Hong, Sunggu Lee, "Processor Allocation and Task Scheduling of Matrix Chain Products on Parallel Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 4, pp. 394-407, April 2003, doi:10.1109/TPDS.2003.1195411
Usage of this product signifies your acceptance of the Terms of Use.