<p><b>Abstract</b>—The known fast sequential algorithms for multiplying two <tmath>$N\times N$</tmath> matrices (over an arbitrary ring) have time complexity <tmath>$O(N^\alpha)$</tmath>, where <tmath>$2 < \alpha < 3$</tmath>. The current best value of <tmath>$\alpha$</tmath> is less than 2.3755. We show that, for all <tmath>$1 \le p \le N^{\alpha}$</tmath>, multiplying two <tmath>$N\times N$</tmath> matrices can be performed on a <it>p</it>-processor linear array with a reconfigurable pipelined bus system (LARPBS) in <tmath>$O({N^{\alpha}\over p}+({N^2\over p^{2/\alpha}})\log p)$</tmath> time. This is currently the fastest parallelization of the best known sequential matrix multiplication algorithm on a distributed memory parallel system. In particular, for all <tmath>$1 \le p \le N^{2.3755}$</tmath>, multiplying two <tmath>$N\times N$</tmath> matrices can be performed on a <it>p</it>-processor LARPBS in <tmath>$O({N^{2.3755}\over p}+({N^2\over p^{0.8419}})\log p)$</tmath> time and linear speedup can be achieved for <tmath>$p$</tmath> as large as <tmath>$O(N^{2.3755}/(\log N)^{6.3262})$</tmath>. Furthermore, multiplying two <tmath>$N\times N$</tmath> matrices can be performed on an LARPBS with <tmath>$O(N^\alpha)$</tmath> processors in <tmath>$O(\log N)$</tmath> time. This compares favorably with the performance on a PRAM.</p>