IEEE 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP'06)
An Improved Systolic Architecture for LU Decomposition
Steamboat Springs, Colorado, USA
September 11-September 13
ISBN: 0-7695-2682-9
LU-Decomposition is a classic problem for which many systolic array implementations have been proposed, the best of which takes 3n-3 time on n2/2 PEs, for a dense, n?n matrix. In this paper, we first give a proof that if only nearest neighbor communication is allowed, this time is a lower bound. We then generalize it to 2n + n/k - 3 time if one allows k-bounded broadcasts (i.e., if it takes m/k time steps to broadcast a value to m destination nodes). We also present a new architecture with this improved execution time, which uses n2/2 PEs, each one consisting of two multiplier-subtractor units, but active only on alternate cycles. This leads to a speedup and efficiency of kn2/(6k+3) and 2k/(6k + 3) respectively. For k = 1, our proposed architecture achieves the performance of the best previously known systolic array implementation. Special cases of our results include similar improvements to algorithms for solving (upper and lower) triangular linear systems by (forward and backward) substitution.
Citation:
DaeGon Kim, Sanjay V. Rajopadhye, "An Improved Systolic Architecture for LU Decomposition," asap, pp.231-238, IEEE 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP'06), 2006