Publication 1995 Issue No. 2 - February Abstract - Floating Point Fault Tolerance with Backward Error Assertions
 This Article Share Bibliographic References Add to: Digg Furl Spurl Blink Simpy Google Del.icio.us Y!MyWeb Search Similar Articles Articles by Daniel Boley Articles by Gene H. Golub Articles by Samy Makar Articles by Nirmal Saxena Articles by Edward J. McCluskey
Floating Point Fault Tolerance with Backward Error Assertions
February 1995 (vol. 44 no. 2)
pp. 302-311
 ASCII Text x Daniel Boley, Gene H. Golub, Samy Makar, Nirmal Saxena, Edward J. McCluskey, "Floating Point Fault Tolerance with Backward Error Assertions," IEEE Transactions on Computers, vol. 44, no. 2, pp. 302-311, February, 1995.
 BibTex x @article{ 10.1109/12.364541,author = {Daniel Boley and Gene H. Golub and Samy Makar and Nirmal Saxena and Edward J. McCluskey},title = {Floating Point Fault Tolerance with Backward Error Assertions},journal ={IEEE Transactions on Computers},volume = {44},number = {2},issn = {0018-9340},year = {1995},pages = {302-311},doi = {http://doi.ieeecomputersociety.org/10.1109/12.364541},publisher = {IEEE Computer Society},address = {Los Alamitos, CA, USA},}
 RefWorks Procite/RefMan/Endnote x TY - JOURJO - IEEE Transactions on ComputersTI - Floating Point Fault Tolerance with Backward Error AssertionsIS - 2SN - 0018-9340SP302EP311EPD - 302-311A1 - Daniel Boley, A1 - Gene H. Golub, A1 - Samy Makar, A1 - Nirmal Saxena, A1 - Edward J. McCluskey, PY - 1995VL - 44JA - IEEE Transactions on ComputersER -

Abstract— This paper introduces an assertion scheme based on the backward error analysis for error detection in algorithms that solve dense systems of linear equations, $A\mbi\left\{x\right\} = \mbi\left\{b\right\}$. Unlike previous methods, this Backward Error Assertion Model is specifically designed to operate in an environment of floating point arithmetic subject to round-off errors, and it can be easily instrumented in a Watchdog processor environment. The complexity of verifying assertions is $O\left(n^2\right)$, compared to the $O\left(n^3\right)$ complexity of algorithms solving $A\mbi\left\{x\right\} = \mbi\left\{b\right\}$. Unlike other proposed error detection methods, this assertion model does not require any encoding of the matrix $A$. Experimental results under various error models are presented to validate the effectiveness of this assertion scheme.

[1] M. Arioli, J. Demmel, and I. S. Duff,“Solving sparse linear systems with sparse backward error,”SIAM J. Mater. Anal., vol. 10, pp. 165–190, 1989.
[2] L. Boley, G. H. Golub, S. Makar, N. Saxena, and E. J. McCluskey,“Backward error assertions for checking solutions to systems of linear equations,”Stanford Univ. Numerical Analysis Project, Report NA-89-12, Nov. 1989.
[3] G. Forsythe and C. Moler,Computer Solution of Linear Algebraic Systems. Englewood Cliffs, NJ: Prentice, 1967.
[4] W. M. Gentleman and H. T. Kung,“Matrix triangularization by systolic arrays,”inProc. SPIE 298, Real-Time Signal Processing IV,pp. 298–303, 1981.
[5] G. Golub and C. Van Loan, Matrix Computations, third ed. Baltimore: Johns Hopkins Univ. Press, 1996.
[6] S. Haykin, Adaptive Filter Theory, 2nd ed., Prentice-Hall, 1991, Chapter 12.
[7] N.J. Higham, "Iterative Refinement Enhances the Stability of QR Factorization Methods for Solving Liner Equations," BIT, vol. 31, pp. 447-468, 1991.
[8] K. H. Huang and J. A. Abraham,“Algorithm-based fault tolerance for matrix operations,”IEEE Trans. Comput.., vol. C-33, no. 6, pp. 518–528, June 1984.
[9] M. Jankowski and H. Wozniakowski,“Iterative refinement implies numerical stability,”BIT, vol. 17, pp. 303–311, 1977.
[10] J. Y. Jou and J. A. Abraham,“Fault-tolerant matrix arithmetic and signal processing on highly concurrent computing structures,”Proc. IEEE, Special Issue on Fault Tolerance, vol. 74, no. 5, pp. 732–741, May 1986.
[11] H. T. Kung and C. E. Leiserson,“Systolic arrays (for VLSI),”inSparse Matrix Proc. 1978, Philadelphia, SIAM, I. S. Duff and G. W. Stewart, Eds., pp. 256–282, 1979.
[12] P.A. Lee and T. Anderson, Fault Tolerance: Principles and Practice, second ed. Vienna, Austria: Springer–Verlag, 1990.
[13] F.T. Luk and H. Park, “An Analysis of Algorithm-Based Fault Tolerance Techniques,” J. Parallel and Distributed Computing, vol. 5, pp. 172-184, 1988.
[14] A. Mahmood and E. J. McCluskey,“Concurrent error detection using watchdog processors—A survey,”IEEE Trans. Comput., vol. 37, no. 2, pp. 160–174, 1988.
[15] J. G. McWhirter,“Recursive least-squares minimization using a systolic array,”inProc. SPIE 431, Real-Time Signal Processing VI,pp. 105–112, 1983.
[16] ——,“Algorithmic engineering—An emerging discipline,”inProc. SPIE 1152, Advanced Algorithms and Architectures for Signal Processing IV, F. T. Luk, Ed., pp. 2–15, 1989.
[17] H. Park, "On Multiple Error Correction in Matrix Triangularizations Using Checksum Schemes," J. Parallel and Distributed Computing, vol. 14, pp. 90-97, 1992.
[18] A. Roy-Chowdhury and P. Banerjee,“Tolerance determination for algorithm based checks using simple error analysis techniques,”inFault Tolerant Computing Symp. FTCS-23, pp. 290–298, IEEE Press, 1993.
[19] R. D. Skeel,“Iterative refinement implies numerical stability for Gaussian elimination,”Math. Comput., vol. 35, pp. 817–832, 1980.
[20] D. C. Sorensen,“Analysis of pairwise pivoting in Gaussian elimination,”IEEE Trans. Comput., vol. C-34, pp. 274–278, 1985.
[21] J.H. Wilkinson,The Algebraic Eigenvalue Problem, Oxford Univ. Press, Oxford, UK, 1965.
[22] J.H. Wilkinson, "Error Analysis of Direct Methods of Matrix Inversion," J. ACM, Vol. 8, 1961, pp. 281-330.

Citation:
Daniel Boley, Gene H. Golub, Samy Makar, Nirmal Saxena, Edward J. McCluskey, "Floating Point Fault Tolerance with Backward Error Assertions," IEEE Transactions on Computers, vol. 44, no. 2, pp. 302-311, Feb. 1995, doi:10.1109/12.364541