Publication 2001 Issue No. 5 - May Abstract - Parallel Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System
Parallel Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System
May 2001 (vol. 50 no. 5)
pp. 519-525
 ASCII Text x Keqin Li, Victor Y. Pan, "Parallel Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System," IEEE Transactions on Computers, vol. 50, no. 5, pp. 519-525, May, 2001.
 BibTex x @article{ 10.1109/12.926164,author = {Keqin Li and Victor Y. Pan},title = {Parallel Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System},journal ={IEEE Transactions on Computers},volume = {50},number = {5},issn = {0018-9340},year = {2001},pages = {519-525},doi = {http://doi.ieeecomputersociety.org/10.1109/12.926164},publisher = {IEEE Computer Society},address = {Los Alamitos, CA, USA},}
 RefWorks Procite/RefMan/Endnote x TY - JOURJO - IEEE Transactions on ComputersTI - Parallel Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus SystemIS - 5SN - 0018-9340SP519EP525EPD - 519-525A1 - Keqin Li, A1 - Victor Y. Pan, PY - 2001KW - Bilinear algorithmKW - cost-optimalityKW - distributed memory systemKW - linear arrayKW - matrix multiplicationKW - optical pipelined busKW - PRAMKW - reconfigurable systemKW - speedup.VL - 50JA - IEEE Transactions on ComputersER -

Abstract—The known fast sequential algorithms for multiplying two $N\times N$ matrices (over an arbitrary ring) have time complexity $O(N^\alpha)$, where $2 < \alpha < 3$. The current best value of $\alpha$ is less than 2.3755. We show that, for all $1 \le p \le N^{\alpha}$, multiplying two $N\times N$ matrices can be performed on a p-processor linear array with a reconfigurable pipelined bus system (LARPBS) in $O({N^{\alpha}\over p}+({N^2\over p^{2/\alpha}})\log p)$ time. This is currently the fastest parallelization of the best known sequential matrix multiplication algorithm on a distributed memory parallel system. In particular, for all $1 \le p \le N^{2.3755}$, multiplying two $N\times N$ matrices can be performed on a p-processor LARPBS in $O({N^{2.3755}\over p}+({N^2\over p^{0.8419}})\log p)$ time and linear speedup can be achieved for $p$ as large as $O(N^{2.3755}/(\log N)^{6.3262})$. Furthermore, multiplying two $N\times N$ matrices can be performed on an LARPBS with $O(N^\alpha)$ processors in $O(\log N)$ time. This compares favorably with the performance on a PRAM.

[1] S.G. Akl, Parallel Computation: Models and Methods. Upper Saddle River, N.J.: Prentice Hall, 1997.
[2] A.F. Benner, H.F. Jordan, and V.P. Heuring, “Digital Optical Computing with Optically Switched Directional Couplers,” Optical Eng., vol. 30, pp. 1936-1941, 1991.
[3] D. Bini and V.Y. Pan, Polynomial and Matrix Computations, vol. 1, Fundamental Algorithms.Boston: Birkhäuser, 1994.
[4] A.K. Chandra, “Maximal Parallelism in Matrix Multiplication,” Report RC-6193, IBM T.J. Watson Research Center, Yorktown Heights, N.Y., Oct. 1979.
[5] D. Chiarulli, R. Melhem, and S. Levitan, "Using Coincident Optical Pulses for Parallel Memory Addressing," Computer, vol. 30, pp. 48-57, 1987.
[6] D. Coppersmith and S. Winograd, "Matrix Multiplication via Arithmetic Progression," J. Symb. Computers, vol. 9, no. 3, pp. 1-6, Mar. 1990.
[7] E. Dekel, D. Nassimi, and S. Sahni, “Parallel Matrix and Graph Algorithms,” SIAM J. Computing, vol. 10, pp. 657-673, 1981.
[8] P.W. Dowd, "Wavelength Division Multiple Access Channel Hypercube Processor Interconnection," IEEE Trans. Computers, vol. 41, no. 10, pp. 1,223-1,241, Oct. 1992.
[9] G.H. Golub and C.F. Van Loan, Matrix Computations. Baltimore: Johns Hopkins Univ. Press, 1996.
[10] Z. Guo, R. Melhem, R. Hall, D. Chiarulli, and S. Levitan, “Pipelined Communication in Optically Interconnected Arrays,” J. Parallel and Distributed Computing, vol. 12, no. 3, pp. 269-282, 1991.
[11] M. Hamdi and Y. Pan, "Efficient Parallel Algorithms on Optically Interconnected Arrays of Processors," IEE Proc. Computers and Digital Techniques, vol. 142, pp. 87-92, Mar. 1995.
[12] I. Kaporin, “A Practical Algorithm for Faster Matrix Multiplication,” Numerical Linear Algebra with Applications, vol. 6, pp. 687-700, 1999.
[13] S. Levitan, D. Chiarulli, and R. Melhem, “Coincident Pulse Techniques for Multiprocessor Interconnection Structures,” Applied Optics, vol. 29, pp. 2024-2039, 1990.
[14] K. Li, "Constant Time Boolean Matrix Multiplication on a Linear Array With a Reconfigurable Pipelined Bus System," J. Supercomputing, vol. 11, no. 4, pp. 391-403, 1997. A preliminary version appeared in Proc. 11th Ann. Int'l Symp. High Performance Computing Systems, pp. 179-190, July 1997.
[15] K. Li, “Fast and Scalable Parallel Algorithms for Matrix Chain Product and Matrix Powers on Optical Buses,” High Performance Computing Systems and Applications, A. Pollard, D.J.K. Mewhort, and D.F. Weaver, eds., pp. 333-348, Boston: Kluwer Academic, 2000.
[16] K. Li, “Fast and Scalable Parallel Matrix Computations with Optical Buses,” Lecture Notes in Computer Science, vol. 1800, pp. 1053-1062, 2000.
[17] K. Li, Y. Pan, and M. Hamdi, “Solving Graph Theory Problems Using Reconfigurable Pipelined Optical Buses,” Parallel Computing, vol. 26, no. 6, pp. 723-735, 2000.
[18] K. Li, Y. Pan, and S.Q. Zheng, eds., Parallel Computing Using Optical Interconnections. Kluwer Academic, 1998 (forthcoming).
[19] K. Li, Y. Pan, and S.-Q. Zheng, “Fast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array with Reconfigurable Pipelined Bus System,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 8, pp. 705-720, Aug. 1998.
[20] K. Li, Y. Pan, and S.Q. Zheng, “Parallel Matrix Computations Using a Reconfigurable Pipelined Optical Bus,” J. Parallel and Distributed Computing, vol. 59, no. 1, pp. 13-30, 1999.
[21] K. Li, Y. Pan, and S.-Q. Zheng, “Efficient Deterministic and Probabilistic Simulations of PRAMs on Linear Arrays with Reconfigurable Pipelined Bus Systems,” The J. Supercomputing, vol. 15, no. 2, pp. 163-181, Feb. 2000.
[22] Y. Li, Y. Pan, and S.Q. Zheng, “Pipelined TDM Optical Bus with Conditional Delays,” Optical Eng., vol. 36, no. 9, pp. 2417-2424, 1997.
[23] Y. Pan and M. Hamdi, "Efficient Computation of Singular Value Decomposition on Arrays With Pipelined Optical Buses," J. Network and Computer Applications, vol. 19, pp. 235-248, July 1996.
[24] Y. Pan, M. Hamdi, and K. Li, “Efficient and Scalable Quicksort on a Linear Array with a Reconfigurable Pipelined Bus System,” Future Generation Computer Systems, vol. 13, pp. 501-513, 1997/98.
[25] Y. Pan and K. Li, “Linear Array with a Reconfigurable Pipelined Bus System—Concepts and Applications,” Information Sciences, vol. 106, no. 3/4, pp. 237-258, May 1998.
[26] Y. Pan, K. Li, and S.Q. Zheng, “Fast Nearest Neighbor Algorithms on a Linear Array with a Reconfigurable Pipelined Bus System,” J. Parallel Algorithms and Applications, vol. 13, pp. 1-25, 1998.
[27] V. Pan, “How to Multiply Matrices Faster,” Lecture Notes in Computer Science, vol. 179, Berlin: Springer-Verlag, 1984.
[28] V. Pan, “How Can We Speed Up Matrix Multiplication?” SIAM Review, vol. 26, no. 3, pp. 393-415, 1984.
[29] V. Pan, “Complexity of Parallel Matrix Computations,” Theoretical Computer Science, vol. 54, pp. 65-85, 1987.
[30] V. Pan, “Parallel Solution of Sparse Linear and Path Systems,” in Synthesis of Parallel Algorithms, J.H. Reif, ed., pp. 621-678, San Mateo, Calif.: Morgan Kaufmann, 1993.
[31] V. Pan and J. Reif, "Efficient Parallel Solution of Linear Systems," Proc. Seventh ACM Symp. Theory of Computing, pp. 143-152, May 1985.
[32] H. Park, H.J. Kim, and V.K. Prasanna, ”An$O(1)$Time Optimal Algorithm for Multiplying Matrices on Reconfigurable Meshes,“ Information Processing Letters, vol. 47, no. 2, pp. 109-113, 1993.
[33] S. Pavel and S.G. Akl, “Matrix Operations Using Arrays with Reconfigurable Optical Buses,” J. Parallel Algorithms and Applications, vol. 8, pp. 223-242, 1996.
[34] C. Qiao and R. Melhem, "Time-Division Optical Communications in Multiprocessor Arrays," IEEE Trans. Computers, vol. 42, no. 5, pp. 577-590, May 1993.
[35] S. Rajasekaran and S. Sahni, “Sorting, Selection and Routing on the Arrays with Reconfigurable Optical Buses,” IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 11, pp. 1123-1132 Nov. 1997.
[36] V. Strassen, “Gaussian Elimination Is Not Optimal,” Numerische Mathematik, vol. 13, pp. 354-356, 1969.
[37] J.L. Trahan, A.G. Bourgeois, Y. Pan, and R. Vaidyanathan, “An Optimal and Scalable Algorithm for Permutation Routing on Reconfigurable Linear Arrays with Optically Pipelined Buses,” J. Parallel and Distributed Computing, vol. 60, no. 9, pp. 1125-1136, Sept. 2000.
[38] C.H. Wu, S.J. Horng, and H.R. Tsai, “Efficient Parallel Algorithms for Hierarchical Clustering on Arrays with Reconfigurable Optical Buses,” J. Parallel and Distributed Computing, vol. 60, pp. 1137-1153, 2000.
[39] S.Q. Zheng and Y. Li, “Pipelined Asynchronous Time-Division Multiplexing Optical Bus,” Optical Eng., vol. 36, no. 12, pp. 3392-3400, 1997.

Index Terms:
Bilinear algorithm, cost-optimality, distributed memory system, linear array, matrix multiplication, optical pipelined bus, PRAM, reconfigurable system, speedup.
Citation:
Keqin Li, Victor Y. Pan, "Parallel Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System," IEEE Transactions on Computers, vol. 50, no. 5, pp. 519-525, May 2001, doi:10.1109/12.926164