The Community for Technology Leaders
Green Image
<p><b>Abstract</b>—Given <tmath>N</tmath> matrices <tmath>A_{1}, A_{2}, \ldots, A_{N}</tmath> of size <tmath>N \times N</tmath>, the matrix chain product problem is to compute <tmath>A_{1} \times A_{2} \times \cdots \times A_{N}</tmath>. Given an <tmath>N \times N</tmath> matrix <tmath>A</tmath>, the matrix powers problem is to calculate the first <tmath>N</tmath> powers of <tmath>A</tmath>, that is, <tmath>A, A^{2}, A^{3}, \ldots, A^{N}</tmath>. We solve the two problems on distributed memory systems (DMSs) with <tmath>p</tmath> processors that can support one-to-one communications in <tmath>T(p)</tmath> time. Assume that the fastest sequential matrix multiplication algorithm has time complexity <tmath>O(N^{\alpha})</tmath>, where the currently best value of <tmath>\alpha</tmath> is less than 2.3755. Let <tmath>p</tmath> be arbitrarily chosen in the range <tmath>1 \leq p \leq N^{\alpha + 1}/(\log N)^{2}</tmath>. We show that the two problems can be solved by a DMS with <tmath>p</tmath> processors in <tmath>T_{\rm chain}(N,p) = </tmath><tmath>O({\frac{N^{\alpha + 1}}{p}} + T(p)(({\frac{N^{2(1 + 1/\alpha)}}{p^{2/\alpha}}})</tmath><tmath>(\log^{+}{\frac{p}{N}})^{1 - 2/\alpha} + \log^{+}({\frac{p\log N}{N^{\alpha}}})\log N))</tmath> and <tmath>T_{\rm power}(N,p) = O({\frac{N^{\alpha + 1}}{p}} + T(p)(({\frac{N^{2(1 + 1/\alpha)}}{p^{2/\alpha}}})(\log^{+}</tmath><tmath>{\frac{p}{2\log N}})^{1 - 2/\alpha}+ (\log N)^{2}))</tmath> times, respectively, where the function <tmath>\log^{+}</tmath> is defined as follows: <tmath>\log^{+}x = \log x</tmath> if <tmath>x \geq 1</tmath> and <tmath>\log^{+}x = 1</tmath> if <tmath>0 < x < 1</tmath>. We also give instantiations of the above results on several typical DMSs and show that computing matrix chain product and matrix powers are fully scalable on distributed memory parallel computers (DMPCs), highly scalable on DMSs with hypercubic networks, and not highly scalable on DMSs with mesh and torus networks.</p>
Cost optimality, distributed memory parallel computer, distributed memory system, dynamic processor allocation, hypercubic network, matrix chain product, matrix multiplication, matrix power, mesh, scalability, speedup, torus.

K. Li, "Analysis of Parallel Algorithms for Matrix Chain Product and Matrix Powers on Distributed Memory Systems," in IEEE Transactions on Parallel & Distributed Systems, vol. 18, no. , pp. 865-878, 2007.
178 ms
(Ver 3.3 (11022016))