This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Generalized Algorithm-Based Fault Tolerance: Error Correction via Kalman Estimation
June 1998 (vol. 47 no. 6)
pp. 639-655

Abstract—An extension to Algorithm-Based Fault Tolerance (ABFT) methodologies shows how parity values dictated by a real convolutional code can be employed by Kalman estimation techniques to perform real number correction for protecting linear processing systems. Intermittent failures appearing in the output samples are detected and corrected using only the syndromes normally generated in ABFT schemes. The algebraic structure of a real convolutional code provides separation needed by recursive Kalman state estimators to affect mean-square error correction. State and parity measurement equations model faults and computational noise in both the linear processing and parity generation subassemblies, and, in a departure from previous models, the noise sources are considered time-varying. The Kalman one-step estimator which makes decisions on all parity values up to the present point is determined, and it separates naturally into detection and correction operations permitting corrective action only when the detection levels exceed thresholds based on roundoff noise energy. The detector/corrector uses efficient multirate block processing techniques as determined by the real convolutional code.

A smoothed fixed-lag Kalman estimator which uses parity values for a fixed amount beyond the point of interest is needed to complete the correction. It employs one-step estimator quantities and implementation simplifications are possible. Examples showing the correction behavior and mean-square error performance are presented, and the size of overhead calculations for detection and correction is estimated. A protected processing system is constructed by introducing additional subassemblies, mostly comparators, with the detection and correction parts already described. Under the usual assumptions of at most a single subassembly failure, no improperly detected or corrected data leave the overall protected configuration.

[1] K.-H. Huang and J.A. Abraham, "Algorithm-Based Fault Tolerance for Matrix Operations." IEEE Trans. Computers, vol. 33, pp. 518-528, 1984.
[2] C. Anfinson and F.T. Luk, "A Linear Algebraic Model of Algorithm-Based Fault Tolerance," IEEE Trans. Computers, Dec. 1988, pp. 1599-1604.
[3] J.-Y. Jou and J.A. Abraham, "Fault-Tolerant Matrix Arithmetic and Signal Processing on Highly Concurrent Computing Structures," Proc. IEEE (Special Issue on Fault Tolerance in VLSI), vol. 74, pp. 732-741, 1986.
[4] J.A. Abraham, "Fault Tolerance Techniques for Highly Parallel Signal Processing Architectures," SPIE Highly Parallel Signal Processing Architectures, K. Bromley, ed., vol. 614, pp. 49-65, 1986.
[5] F.T. Luk and H. Park, "An Analysis of Algorithm-Based Fault Tolerance Techniques," SPIE Advanced Algorithms and Architectures for Signal Processing, vol. 696, pp. 222-227, 1986.
[6] F.T. Luk, "Algorithm-Based Fault Tolerance for Parallel Matrix Equation Solvers," SPIE Real-Time Signal Processing, W.J. Miceli and K. Bromley, eds., vol. 564, pp. 49-53, 1985.
[7] C.J. Anfinson, R.P. Brent, and F.T. Luk, "A Theoretical Foundation for the Weighted Checksum Scheme," SPIE Advanced Algorithms and Architectures for Signal Processing, vol. 975, pp. 10-18, 1988.
[8] T.G. Marshall Jr., "Coding of Real-Number Sequences for Error Correction: A Digital Signal Processing Problem," IEEE J. Selected Areas in Comm., vol. 2, pp. 381-392, 1984.
[9] J.K. Wolf, "Redundancy, the Discrete Fourier Transform, and Impulse Noise Cancellation," IEEE Trans. Comm., vol. 31, pp. 458-461, 1983.
[10] T.G. Marshall Jr., "Real Number Transform and Convolutional Codes," Proc. 24th Midwest Symp. Circuits and Systems, pp. 650-653,Albuquerque, N.M., June 1981.
[11] A. Roy-Chowdhury and P. Banerjee, “Tolerance Determination for Algorithm-Based Checks Using Simplified Error Analysis Techniques,” Proc. 23rd IEEE Fault-Tolerant Computing Symp. (FTCS-23), pp. 290-298, June 1993.
[12] F.T. Assad and S. Dutt,"More Robust Tests in Algorithm-Based Fault-Tolerant MatrixMultiplication," Proc FTCS-22, pp. 430-439, June 1992.
[13] W.S. Song and B.R. Musicus, "A Fault-Tolerant Multiprocessor Architecture for Digital Signal Processing Applications," Technical Report RLE-TR-552, Massachusetts Inst. of Tech nology, Cambridge, Mass., Feb. 1990.
[14] W.S. Song and B.R. Musicus, "A Fault-Tolerant Architecture for a Parallel Digital Signal Processing Machine," Proc. 1987 IEEE Int'l Conf. Computer Design: VLSI in Computers&Processors (ICCD'87), pp. 385-390,Rye Brook, N.Y, Oct.5-8 1987.
[15] P.E. Beckmann and B.R. Musicus, "Fault-Tolerant Round-Robin A/D Converter System," IEEE Trans. Circuits and Systems, vol. 38, pp. 1,420-1,429, 1991.
[16] D. Heckerman and B. Nathwani, "An Evaluation of the Diagnostic Accuracy of PathFinder," Computers and Biomedical Research, vol. 25, pp. 56-74, 1992.
[17] W.B. Bogan, "An Implementation of a Fault-Tolerant, Multidirectional, Digital Interpolation Beamformer," Report CSDL-T-1131, The Charles Draper Laboratory, Massachusetts Inst. of Tech nology, June 1992.
[18] G.R. Redinbo and B.G. Zagar, "Modifying Real Convolutional Codes for Protecting Digital Filtering Systems," IEEE Trans. Information Theory, vol. 39, Mar. 1993.
[19] S.S. Guillory, J.A. Martin, G.R. Redinbo, and B.G. Zagar, "Fault-Tolerant Design Methods of VLSI Digital Filter Implementations," VLSI Signal Processing III, H.S. Moscovitz and R.W. Brodersen, eds., chapter 35. New York: IEEE Press, 1989.
[20] V.S.S. Nair and J.A. Abraham, "Real-Number Codes for Fault-Tolerant Matrix Operations on Processor Arrays," IEEE Trans. on Computers, Vol. 39, No. 4, Apr. 1990, pp. 426-435.
[21] S. Lin and D. J. Costello,Error Control Coding: Fundamentals and Applications. Englewood Cliffs, NJ: Prentice-Hall, 1983.
[22] G.D. Forney Jr., "The Viterbi Algorithm," Proc. IEEE, vol. 61, pp. 268-278, 1973.
[23] C.K. Chui and G. Chen, Kalman Filtering with Real-Time Applications Second Edition.Berlin: Springer-Verlag, 1991.
[24] A.V. Balakrishnan, Kalman Filtering Theory.New York: Optimization Software, 1987.
[25] B.D.O. Anderson and J.B. Moore, Optimal Filtering.Englewood Cliffs, N.J.: Prentice Hall, 1979.
[26] L.L. Scharf, Statistical Signal Processing Detection, Estimation, and Time Series Analysis.Reading, Mass.: Addison Wesley, 1991.
[27] A.P. Sage and J.L. Melsa, Estimation Theory with Applications to Communications and Control.New York: McGraw-Hill, 1971.
[28] S.A. Tretter, Introduction to Discrete-Time Signal Processing.New York: John Wiley&Sons, 1976.
[29] A.V. Oppenheim and R.W. Schafer, Discrete-Time Signal Processing.Englewood Cliffs, N.J.: Prentice Hall, 1989.
[30] A. J. Viterbi,J. K. Omura,“Principles of Digital Communication and Coding,”McGraw-Hill, Tokyo, 1979.
[31] G.C. Clark Jr. and J. Bibb Cain, Error-Correction Coding for Digital Communications.New York: Plenum Press, 1981.
[32] J. Wakerly, Error Detecting Codes, Self-Checking Circuits and Applications.New York: North-Holland, 1978.
[33] B.W. Johnson, Design and Analysis of Fault-Tolerant Digital Systems, pp. 394-402. Reading, Mass.: Addison-Wesley, June 1989.
[34] R. Redinbo, "Optimum Kalman Detector/Corrector for Fault-Tolerant Linear Processing," Proc. FTCS-23, pp. 299-308, June 1993.
[35] S. Bose and A.O. Steinhardt, "Optimum Array Detector for a Weak Signal in Unknown Noise," IEEE Trans. Aerospace and Electronic Systems, vol. 32, pp. 911-22, 1996.
[36] C.D. Richmond, "A Note on Non-Gaussian Adaptive Array Detection and Signal Parameter Estimation," IEEE Signal Processing Letters, vol. 3, pp. 251-252, 1996.

Index Terms:
Algorithm-based fault tolerance, fault-tolerant linear processing, Kalman recursive filtering, mean-square error estimation, real convolutional codes, real number error correction, time-varying fault models, totally self-checking comparators.
Citation:
G. Robert Redinbo, "Generalized Algorithm-Based Fault Tolerance: Error Correction via Kalman Estimation," IEEE Transactions on Computers, vol. 47, no. 6, pp. 639-655, June 1998, doi:10.1109/12.689644
Usage of this product signifies your acceptance of the Terms of Use.