The Community for Technology Leaders
Green Image
<p><b>Abstract</b>—The known fast sequential algorithms for multiplying two <tmath>$N\times N$</tmath> matrices (over an arbitrary ring) have time complexity <tmath>$O(N^\alpha)$</tmath>, where <tmath>$2 < \alpha < 3$</tmath>. The current best value of <tmath>$\alpha$</tmath> is less than 2.3755. We show that, for all <tmath>$1 \le p \le N^{\alpha}$</tmath>, multiplying two <tmath>$N\times N$</tmath> matrices can be performed on a <it>p</it>-processor linear array with a reconfigurable pipelined bus system (LARPBS) in <tmath>$ O({N^{\alpha}\over p}+({N^2\over p^{2/\alpha}})\log p)$</tmath> time. This is currently the fastest parallelization of the best known sequential matrix multiplication algorithm on a distributed memory parallel system. In particular, for all <tmath>$1 \le p \le N^{2.3755}$</tmath>, multiplying two <tmath>$N\times N$</tmath> matrices can be performed on a <it>p</it>-processor LARPBS in <tmath>$ O({N^{2.3755}\over p}+({N^2\over p^{0.8419}})\log p) $</tmath> time and linear speedup can be achieved for <tmath>$p$</tmath> as large as <tmath>$O(N^{2.3755}/(\log N)^{6.3262})$</tmath>. Furthermore, multiplying two <tmath>$N\times N$</tmath> matrices can be performed on an LARPBS with <tmath>$O(N^\alpha)$</tmath> processors in <tmath>$O(\log N)$</tmath> time. This compares favorably with the performance on a PRAM.</p>
Bilinear algorithm, cost-optimality, distributed memory system, linear array, matrix multiplication, optical pipelined bus, PRAM, reconfigurable system, speedup.
Victor Y. Pan, Keqin Li, "Parallel Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System", IEEE Transactions on Computers, vol. 50, no. , pp. 519-525, May 2001, doi:10.1109/12.926164
87 ms
(Ver 3.1 (10032016))