This Article 
 Bibliographic References 
 Add to: 
A New Error Analysis Based Method for Tolerance Computation for Algorithm-Based Checks
February 1996 (vol. 45 no. 2)
pp. 238-243

Abstract-Algorithm-based techniques are based on checking for the preservation of certain properties possessed by global data following a set of computations. This often involves the introduction of a check variable which is updated in such a manner that, in the absence of roundoff errors, it equals the value of some function which involves all the data elements participating in the algorithm. However, the fact that roundoff errors accumulate in different ways in the updates involving the check variables and the computations involving data elements make it highly unlikely that the equality is preserved exactly for an implementation of the algorithm on a real computer. Thus, the check step involves verifying the preservation of the equality to within a tolerance value. In this brief contribution, we propose a method for determination of the tolerance based on error analysis techniques. We present results on three numerical algorithms which show the effectiveness of our approach for data sets of varying sizes and data ranges.

[1] K.-H. Huang and J.A. Abraham, "Algorithm-based fault tolerance for matrix operations," IEEE Trans. Computers, vol. 33, no. 6, pp. 518-528, June 1984.
[2] J.-Y. Jou and J.A. Abraham, "Fault-tolerant matrix operations on multiple processor systems using weighted checksums," SPIE Proc., vol. 495, Aug. 1984.
[3] P. Banarjee, J.T. Rahmeh, C. Stunkel, V.S. Nair, K. Roy, V. Balasubramanian, and J.A. Abraham, “Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor,” IEEE Trans. Computers, vol. 39, no. 9, pp. 1132-1145, Sept. 1990.
[4] A.L.N. Reddy and P. Banerjee, “Algorithm-Based Fault Detection for Signal Processing Applications,” IEEE Trans. Computers, vol. 39, no. 10, pp. 1,304-1,308, Oct. 1990.
[5] V. Balasubramanian,"The Analysis and Synthesis of Efficient Algorithm-Based Error Detection Schemes for Hypercube Multiprocessors," PhD dissertation, Univ. of Illi nois, Urbana-Champaign, Feb. 1991, Technical Report no. CRHC-91-6, UILU-ENG-91-2210.
[6] F.T. Assad and S. Dutt,"More Robust Tests in Algorithm-Based Fault-Tolerant MatrixMultiplication," Proc FTCS-22, pp. 430-439, June 1992.
[7] J.H. Wilkinson,The Algebraic Eigenvalue Problem, Oxford Univ. Press, Oxford, UK, 1965.
[8] F.T. Luk and H. Park, “An Analysis of Algorithm-Based Fault Tolerance Techniques,” J. Parallel and Distributed Computing, vol. 5, pp. 172-184, 1988.
[9] A. Roy-Chowdhury and P. Banerjee, “Tolerance Determination for Algorithm-Based Checks Using Simplified Error Analysis Techniques,” Proc. 23rd IEEE Fault-Tolerant Computing Symp. (FTCS-23), pp. 290-298, June 1993.
[10] A. Roy-Chowdhury, "Evaluation of algorithm based fault-tolerance techniques on multiple fault classes in the presence of finite precision arithmetic," MS thesis, Univ. of Illi nois, Urbana-Champaign, Aug. 1992, Technical Report no. CRHC-92-15, UILU-ENG-92-2228.
[11] A. Roy-Chowdhury and P. Banerjee, "A fault-tolerant algorithm for iterative solution of the Laplace equation," Proc. 23rd Int'l Conf. Parallel Processing, Aug. 1993.
[12] G.H. Golub and C.F.V. Loan, Matrix Computations. Baltimore: Johns Hopkins Univ. Press, 1987.

Index Terms:
Parallel algorithms, algorithm-based fault-tolerance, checksum encodings, check thresholding, roundoff error analysis.
Amber-Roy Chowdhury, Prithviraj Banerjee, "A New Error Analysis Based Method for Tolerance Computation for Algorithm-Based Checks," IEEE Transactions on Computers, vol. 45, no. 2, pp. 238-243, Feb. 1996, doi:10.1109/12.485376
Usage of this product signifies your acceptance of the Terms of Use.