1996 International Conference on Parallel Processing (ICPP'96) - Volume 2
(R) A Three-Parameter Fast Givens QR Algorithm for Superscalar Processors
Bloomington, IL
August 12-August 16
ISBN: 0-8186-7623-x
J.J. Carrig, Jr., Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD, USA
G.G.L. Meyer, Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD, USA
Abstract: We present a three parameter fast Givens QR algorithm that exploits parallelism to improve performance on superscalar processors. We provide a selection of parameter values for which the new algorithm reduces to the standard algorithm, but show that non-standard values minimize the number of cache misses, memory references and pipeline stalls. Using a tractable model of a superscalar machine architecture, we derive rules for estimating the optimal combination of parameter values. Applying these rules, we observe a speedup over the standard algorithm of 2.4 on the Intel Pentium Pro system, 2.0 on a single thin POWER2 processor of the IBM SP2, 1.6 on a single wide POWER2 processor of the IBM SP2, and 4.2 on a single R8000 processor of the SGI POWER Challenge XL.
Index Terms:
parallel processing; performance evaluation; signal processing; least squares approximations; Kalman filters; eigenvalues and eigenfunctions; fast Givens QR algorithm; superscalar processors; parallelism; performance improvement; parameter values; cache misses; memory references; pipeline stalls; tractable model; superscalar machine architecture; Intel Pentium Pro system; POWER2 processor; IBM SP2; single R8000 processor; SGI POWER Challenge XL
Citation:
J.J. Carrig, Jr., G.G.L. Meyer, "(R) A Three-Parameter Fast Givens QR Algorithm for Superscalar Processors," icpp, vol. 2, pp.0011, 1996 International Conference on Parallel Processing (ICPP'96) - Volume 2, 1996