Publication 1998 Issue No. 8 - August Abstract - Fast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array With a Reconfigurable Pipelined Bus System
 This Article Share Bibliographic References Add to: Digg Furl Spurl Blink Simpy Google Del.icio.us Y!MyWeb Search Similar Articles Articles by Keqin Li Articles by Yi Pan Articles by Si Qing Zheng
Fast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array With a Reconfigurable Pipelined Bus System
August 1998 (vol. 9 no. 8)
pp. 705-720
 ASCII Text x Keqin Li, Yi Pan, Si Qing Zheng, "Fast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array With a Reconfigurable Pipelined Bus System," IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 8, pp. 705-720, August, 1998.
 BibTex x @article{ 10.1109/71.706044,author = {Keqin Li and Yi Pan and Si Qing Zheng},title = {Fast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array With a Reconfigurable Pipelined Bus System},journal ={IEEE Transactions on Parallel and Distributed Systems},volume = {9},number = {8},issn = {1045-9219},year = {1998},pages = {705-720},doi = {http://doi.ieeecomputersociety.org/10.1109/71.706044},publisher = {IEEE Computer Society},address = {Los Alamitos, CA, USA},}
 RefWorks Procite/RefMan/Endnote x TY - JOURJO - IEEE Transactions on Parallel and Distributed SystemsTI - Fast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array With a Reconfigurable Pipelined Bus SystemIS - 8SN - 1045-9219SP705EP720EPD - 705-720A1 - Keqin Li, A1 - Yi Pan, A1 - Si Qing Zheng, PY - 1998KW - Compound algorithmKW - linear arrayKW - matrix multiplicationKW - optical pipelined busKW - reconfigurabilityKW - Strassen's algorithm.VL - 9JA - IEEE Transactions on Parallel and Distributed SystemsER -

Abstract—We present efficient parallel matrix multiplication algorithms for linear arrays with reconfigurable pipelined bus systems (LARPBS). Such systems are able to support a large volume of parallel communication of various patterns in constant time. An LARPBS can also be reconfigured into many independent subsystems and, thus, is able to support parallel implementations of divide-and-conquer computations like Strassen's algorithm. The main contributions of the paper are as follows: We develop five matrix multiplication algorithms with varying degrees of parallelism on the LARPBS computing model, namely, MM1, MM2, MM3, and compound algorithms ${\cal C}_1(\epsilon)$ and ${\cal C}_2(\delta).$ Algorithm ${\cal C}_1(\epsilon)$ has adjustable time complexity in sublinear level. Algorithm ${\cal C}_2(\delta)$ implies that it is feasible to achieve sublogarithmic time using o(N3) processors for matrix multiplication on a realistic system. Algorithms MM3, ${\cal C}_1(\epsilon),$ and ${\cal C}_2(\delta)$ all have o(N3) cost and, hence, are very processor efficient. Algorithms MM1, MM3, and ${\cal C}_1(\epsilon)$ are general-purpose matrix multiplication algorithms, where the array elements are in any ring. Algorithms MM2 and ${\cal C}_2(\delta)$ are applicable to array elements that are integers of bounded magnitude, or floating-point values of bounded precision and magnitude, or Boolean values. Extension of algorithms MM2 and ${\cal C}_2(\delta)$ to unbounded integers and reals are also discussed.

[1] A.V. Aho,J.E. Hopcroft, and J.D. Ullman,The Design and Analysis of Computer Algorithms.Reading, Mass.: Addison-Wesley, 1974.
[2] S.G. Akl, Parallel Computation: Models and Methods. Upper Saddle River, N.J.: Prentice Hall, 1997.
[3] A.F. Benner, H.F. Jordan, and V.P. Heuring, "Digital Optical Computing With Optically Switched Directional Couplers," Optical Eng., vol. 30, pp. 1,936-1,941, 1991.
[4] D. Bini and V.Y. Pan, Polynomial and Matrix Computations, vol. 1, Fundamental Algorithms.Boston: Birkhäuser, 1994.
[5] S.H. Bokhari, "Finding Maximum on an Array Processor With a Global Bus," IEEE Trans. Computers, vol. 32, pp. 133-139, 1984.
[6] R.A. Brualdi and H.J. Ryser, Combinatorial Matrix Theory.New York: Cambridge Univ. Press, 1991.
[7] L.E. Cannon, "A Cellular Computer to Implement the Kalman Filter Algorithm," PhD thesis, Montana State Univ., 1969.
[8] A.K. Chandra, "Maximal Parallelism in Matrix Multiplication," Report RC-6193, IBM T.J. Watson Research Center, Oct. 1979.
[9] D. Chiarulli, R. Melhem, and S. Levitan, "Using Coincident Optical Pulses for Parallel Memory Addressing," Computer, vol. 30, pp. 48-57, 1987.
[10] K.L. Chung, "Generalized Mesh-Connected Computers With Multiple Buses," Proc. Int'l Conf. Parallel and Distributed Systems, pp. 622-626, Dec. 1993.
[11] D. Coppersmith and S. Winograd, "Matrix Multiplication via Arithmetic Progression," J. Symb. Computers, vol. 9, no. 3, pp. 1-6, Mar. 1990.
[12] E. Dekel, D. Nassimi, and S. Sahni, "Parallel Matrix and Graph Algorithms," SIAM J. Computing, vol. 10, pp. 657-673, 1981.
[13] P.W. Dowd, "Wavelength Division Multiple Access Channel Hypercube Processor Interconnection," IEEE Trans. Computers, vol. 41, no. 10, pp. 1,223-1,241, Oct. 1992.
[14] Z. Guo, "Sorting on Array Processors With Pipelined Buses," Proc. Int'l Conf. Parallel Processing, pp. 289-292, Aug. 1992.
[15] Z. Guo, R. Melhem, R. Hall, D. Chiarulli, and S. Levitan, “Pipelined Communication in Optically Interconnected Arrays,” J. Parallel and Distributed Computing, vol. 12, no. 3, pp. 269-282, 1991.
[16] M. Hamdi and Y. Pan, "Efficient Parallel Algorithms on Optically Interconnected Arrays of Processors," IEE Proc. Computers and Digital Techniques, vol. 142, pp. 87-92, Mar. 1995.
[17] S.J. Horng, "Prefix Computation and Some Related Applications on Mesh Connected Computers With Hyperbus Broadcasting," Proc. Int'l Conf. Computing and Information, pp. 366-388, July 1995.
[18] IEEE, Standard 754, Order No. CN-953,Los Alamitos, Calif.: IEEE CS Press, 1985.
[19] V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin Cummings, 1994.
[20] F.T. Leighton,Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes.San Mateo, Calif.: Morgan Kaufmann, 1992.
[21] S. Levitan, D. Chiarulli, and R. Melhem, "Coincident Pulse Techniques for Multiprocessor Interconnection Structures," Applied Optics, vol. 29, pp. 2,024-2,039, 1990.
[22] K. Li, "Constant Time Boolean Matrix Multiplication on a Linear Array With a Reconfigurable Pipelined Bus System," J. Supercomputing, vol. 11, no. 4, pp. 391-403, 1997. A preliminary version appeared in Proc. 11th Ann. Int'l Symp. High Performance Computing Systems, pp. 179-190, July 1997.
[23] K. Li, Y. Pan, and S.Q. Zheng, "Simulation of Parallel Random Access Machines on a Linear Array With a Reconfigurable Pipelined Bus System," Proc. Int'l Conf. Parallel and Distributed Processing Techniques and Applications, vol. II, pp. 590-599, July 1997.
[24] K. Li, Y. Pan, and S.Q. Zheng, "Fast and Efficient Parallel Matrix Computations on a Linear Array With a Reconfigurable Pipelined Optical Bus System," High Performance Computing Systems and Applications, J. Schaeffer and R. Unrau, eds. Kluwer Academic, 1998.
[25] K. Li, Y. Pan, and S.Q. Zheng, "Scalable Parallel Matrix Multiplication Using Reconfigurable Pipelined Optical Bus Systems," Proc. 10th Int'l Conf. Parallel and Distributed Computing and Systems, Oct. 1998.
[26] K. Li, Y. Pan, and S.Q. Zheng, eds., Parallel Computing Using Optical Interconnections. Kluwer Academic, 1998 (forthcoming).
[27] Y. Li, Y. Pan, and S.Q. Zheng, "Pipelined TDM Optical Bus With Conditional Delays," Optical Eng., vol. 36, no. 9, pp. 2,417-2,424, 1997.
[28] Y. Li and S.Q. Zheng, "Parallel Selection on a Pipelined TDM Optical Buses," Proc. Int'l Conf. Parallel and Distributed Computing Systems, pp. 69-73,Dijon, France, Sept. 1996.
[29] A. Louri, “Three-Dimensional Optical Architecture and Data-Parallel Algorithms for Massively Parallel Computing,” IEEE Micro, vol. 11, no. 2, Apr. 1991.
[30] R. Miller,V.K. Prasanna Kumar,D.I. Reisis, and Q.F. Stout,“Parallel computations on reconfigurable meshes,” IEEE Trans. on Computers, pp. 678-692, June 1993.
[31] Y. Pan, "Hough Transform on Arrays With an Optical Bus," Proc. Fifth Int'l Conf. Parallel and Distributed Computing and Systems, pp. 161-166, Oct. 1992.
[32] Y. Pan, “Order Statistics on Optically Interconnected Multiprocessor Systems,” Proc. First Int'l Workshop Massively Parallel Processing Using Optical Interconnections, pp. 162-169, 1994.
[33] Y. Pan and M. Hamdi, "Efficient Computation of Singular Value Decomposition on Arrays With Pipelined Optical Buses," J. Network and Computer Applications, vol. 19, pp. 235-248, July 1996.
[34] Y. Pan and M. Hamdi, “Quicksort on a Linear Array with a Reconfigurable Pipelined Bus System,” Proc. IEEE Int'l Symp. Parallel Architectures, Algorithms, and Networks, pp. 313-319, 1996.
[35] Y. Pan and K. Li, “Linear Array with a Reconfigurable Pipelined Bus System—Concepts and Applications,” Information Sciences, vol. 106, no. 3/4, pp. 237-258, May 1998.
[36] Y. Pan, K. Li, and S.Q. Zheng, "Fast Nearest Neighbor Algorithms on a Linear Array With a Reconfigurable Pipelined Bus System," to appear in Parallel Algorithms and Applications. A preliminary version appeared in Proc. IEEE Int'l Symp. Parallel Architectures, Algorithms, and Networks, pp. 444-450, Dec. 1997.
[37] V. Pan, "Parallel Solution of Sparse Linear and Path Systems," Synthesis of Parallel Algorithms, J.H. Reif ed., pp. 621-678.San Mateo, Calif.: Morgan Kaufmann, 1993.
[38] V. Pan and J. Reif, "Efficient Parallel Solution of Linear Systems," Proc. Seventh ACM Symp. Theory of Computing, pp. 143-152, May 1985.
[39] H. Park, H.J. Kim, and V.K. Prasanna, ”An$O(1)$Time Optimal Algorithm for Multiplying Matrices on Reconfigurable Meshes,“ Information Processing Letters, vol. 47, no. 2, pp. 109-113, 1993.
[40] S. Pavel, “Computation and Communication Aspects of Arrays with Optical Pipelined Buses,” PhD Dissertation, Dept. of Computing and Information Science, Queen's Univ., Canada, Oct. 1996.
[41] S. Pavel and S.G. Akl, "Matrix Operations Using Arrays With Reconfigurable Optical Buses," J. Parallel Algorithms and Applications, vol. 8, pp. 223-242, 1996.
[42] S. Pavel and S.G. Akl, "On the Power of Arrays With Reconfigurable Optical Buses," Proc. Int'l Conf. Parallel and Distributed Processing Techniques and Applications, vol. III, pp. 1,443-1,454, Aug. 1996.
[43] C. Qiao, “Efficient Matrix Operations in a Reconfigurable Array with Spanning Optical Buses,” Proc. Fifth IEEE Symp. Frontiers of Massively Parallel Computations, pp. 273-280, 1995.
[44] C. Qiao and R. Melhem, "Time-Division Optical Communications in Multiprocessor Arrays," IEEE Trans. Computers, vol. 42, no. 5, pp. 577-590, May 1993.
[45] S. Rajasekaran and S. Sahni, “Sorting, Selection and Routing on the Arrays with Reconfigurable Optical Buses,” IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 11, pp. 1123-1132 Nov. 1997.
[46] V. Strassen, "Gaussian Elimination Is Not Optimal," Numerische Mathematik, vol. 13, pp. 354-356, 1969.
[47] C. Tocci and H.J. Caulfield, Optical Interconnection—Foundations and Applications. Artech Nouce, Inc., 1994.
[48] J.L. Trahan, Y. Pan, R. Vaidyanathan, and A.G. Bourgeois, "Scalable Basic Algorithms on a Linear Array With a Reconfigurable Pipelined Bus System," Proc. 10th Int'l Conf. Parallel and Distributed Computing Systems, pp. 564-569, Oct. 1997.
[49] J.L. Trahan, Y. Pan, R. Vaidyanathan, and A.G. Bourgeois, "Scalable Algorithms and Simulation Results on a Linear Array With a Reconfigurable Pipelined Bus System," submitted for publication.
[50] S.Q. Zheng and Y. Li, "Pipelined Asynchronous Time-Division Multiplexing Optical Bus," Optical Eng., vol. 36, no. 12, pp. 3,392-3,400, 1997.

Index Terms:
Compound algorithm, linear array, matrix multiplication, optical pipelined bus, reconfigurability, Strassen's algorithm.
Citation:
Keqin Li, Yi Pan, Si Qing Zheng, "Fast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array With a Reconfigurable Pipelined Bus System," IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 8, pp. 705-720, Aug. 1998, doi:10.1109/71.706044