
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
Amitabh Mishra, Prithviraj Banerjee, "An AlgorithmBased Error Detection Scheme for the Multigrid Method," IEEE Transactions on Computers, vol. 52, no. 9, pp. 10891099, September, 2003.  
BibTex  x  
@article{ 10.1109/TC.2003.1228507, author = {Amitabh Mishra and Prithviraj Banerjee}, title = {An AlgorithmBased Error Detection Scheme for the Multigrid Method}, journal ={IEEE Transactions on Computers}, volume = {52}, number = {9}, issn = {00189340}, year = {2003}, pages = {10891099}, doi = {http://doi.ieeecomputersociety.org/10.1109/TC.2003.1228507}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Computers TI  An AlgorithmBased Error Detection Scheme for the Multigrid Method IS  9 SN  00189340 SP1089 EP1099 EPD  10891099 A1  Amitabh Mishra, A1  Prithviraj Banerjee, PY  2003 KW  AlgorithmBased Fault Tolerance KW  multigrid method KW  rounding error analysis KW  parallel KW  error detection KW  partial differential equations. VL  52 JA  IEEE Transactions on Computers ER   
Abstract—Algorithmbased Fault Tolerance (ABFT) is a technique to provide system level error detection and correction on array processors as well as multiprocessors at a low cost. Since the early 80s the technique has been extensively applied to several linear algebraic algorithms, e.g., matrix multiplication, Gaussian elimination, QR factorization, and singular value decompositions, etc. An important class of problems in numerical linear algebra dealing with the iterative solution of linear algebraic equations arising due to the finite difference discretization or the finite element discretization of a partial differential equation, however, has been overlooked. The only exception is the recent application of algorithm based error detection (ABED) encodings to the successive overrelaxation algorithm for Laplace's equation. In this paper, ABED is applied to a multigrid algorithm for the iterative solution of a Poisson equation in two dimensions. Invariants are created to implement checking in the relaxation, the restriction, and the interpolation operators. Modifications to invariants due to roundoff errors accumulated within the operators, which often lead to a situation known as false alarms, have been addressed by deriving the expressions for the roundoff errors in the algebraic processes in the operators and correcting the invariants accordingly. ABED encoded multigrid algorithm is shown to be insensitive to the size and the range of the input data besides providing excellent error coverage at a low latency for floatingpoint, integer, and memory errors.
[1] K.H. Huang and J.A. Abraham, Algorithm Based Fault Tolerance for Matrix Operations IEEE Trans. Computers, vol. 33, no. 6, pp. 518528, June 1984.
[2] J.Y. Jou and J.A. Abraham, Fault Tolerant Matrix Operations on Multiple Processor Systems Using Weighted Checksum SPIE Proc., vol. 495, Aug. 1984.
[3] Y. Choi and M. Malek,“A faulttolerant FFT processor,” IEEE Trans. Computers, vol. 37, pp. 617621, May 1988.
[4] J.Y. Jou and J.A. Abraham, "Fault Tolerant FFT Networks," IEEE Trans. Computers, Vol. 37, May 1988, pp. 548561.
[5] A.L.N. Reddy and P. Banerjee, “AlgorithmBased Fault Detection for Signal Processing Applications,” IEEE Trans. Computers, vol. 39, no. 10, pp. 1,3041,308, Oct. 1990.
[6] C.Y. Chen and J.A. Abraham, FaultTolerant Systems for the Computation of Eigenvalues and Singular Values Proc. SPIE Conf., pp. 228237, Aug. 1986.
[7] V. Balasubramanian and P. Banarjee, “Tradeoffs in the Design of Efficient AlgorithmBased Error Detection Schemes for Hypercube Multiprocessors,” IEEE Trans. Software Eng., vol. 16, no. 2, pp. 183196, Feb. 1990.
[8] P. Banarjee, J.T. Rahmeh, C. Stunkel, V.S. Nair, K. Roy, V. Balasubramanian, and J.A. Abraham, “AlgorithmBased Fault Tolerance on a Hypercube Multiprocessor,” IEEE Trans. Computers, vol. 39, no. 9, pp. 11321145, Sept. 1990.
[9] G.H. Golub and C.F.V. Loan, Matrix Computations. Baltimore: Johns Hopkins Univ. Press, 1987.
[10] A. RoyChowdhury and P. Banerjee, “Tolerance Determination for AlgorithmBased Checks Using Simplified Error Analysis Techniques,” Proc. 23rd IEEE FaultTolerant Computing Symp. (FTCS23), pp. 290298, June 1993.
[11] A. RoyChowdhury and P. Banerjee, A Fault Tolerant Parallel Algorithm for Iterative Solution of the Laplace Equation Proc. Int'I Conf. Parallel Processing, Aug. 1993.
[12] A. RoyChowdhury, N. Bellas, and P. Banerjee, AlgorithmBased Error Detection Schemes for Iterative Solution of Partial Differential Equations IEEE Trans. Computers, vol. 45, no. 4, pp. 394407, Apr. 1996.
[13] J.H. Wilkinson,The Algebraic Eigenvalue Problem, Oxford Univ. Press, Oxford, UK, 1965.
[14] A. Mishra and P. Banerjee, An Algorithm Based Error Detection Scheme for the Multigrid Algorithm Proc. 29th Int'l FaultTolerant Computing Symp. (FTCS29), pp. 1219, 1999.
[15] W.L. Briggs, A Multigrid Tutorial. SIAM, 1987.
[16] SunOS 5.3 Guide to Multithread Programming SunSoft, Nov. 1993.
[17] A. Mishra, A Fault Tolerant Parallel Multigrid Algorithm MS thesis, Dept. of Computer Science, Univ. of Illinois, Urbana Champaign, Dec. 1995.
[18] A. RoyChoudhury, Evaluation of Algorithm Based Fault Tolerance Techniques on Multiple Fault Classes in the Presence of Finite Precision Arithmetic MS thesis, Univ. of Illinois, UrbanaChampaign, July 1992.