
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
B. Vinnakota, N.K. Jha, "Diagnosability and Diagnosis of AlgorithmBased FaultTolerant Systems," IEEE Transactions on Computers, vol. 42, no. 8, pp. 924937, August, 1993.  
BibTex  x  
@article{ 10.1109/12.238483, author = {B. Vinnakota and N.K. Jha}, title = {Diagnosability and Diagnosis of AlgorithmBased FaultTolerant Systems}, journal ={IEEE Transactions on Computers}, volume = {42}, number = {8}, issn = {00189340}, year = {1993}, pages = {924937}, doi = {http://doi.ieeecomputersociety.org/10.1109/12.238483}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Computers TI  Diagnosability and Diagnosis of AlgorithmBased FaultTolerant Systems IS  8 SN  00189340 SP924 EP937 EPD  924937 A1  B. Vinnakota, A1  N.K. Jha, PY  1993 KW  diagnosability; parallel processing architectures; diagnosis; algorithmbased faulttolerant systems; signal processing; faulty processors; concurrent error detection; fault location scheme; multiprocessor systems; systemlevel diagnosis; fault tolerant computing; parallel processing. VL  42 JA  IEEE Transactions on Computers ER   
Parallel processing architectures are commonly used for signal processing and other computationally intensive applications. These applications are characterized by high throughput and long processing periods. Such characteristics decrease the reliability of highperformance architectures. The erroneous data produced by faulty processors could have damaging consequences, particularly in critical realtime applications. It is therefore desirable that any erroneous data produced by the system be detected and located as quickly as possible. Algorithmbased fault tolerance (ABFT) is a lowcost systemlevel concurrent error detection and fault location scheme. Methods used in the analysis of multiprocessor systems using systemlevel diagnosis are applied to the analysis of ABFT systems. A new algorithm for analyzing an ABFT system for its fault diagnosability is developed using these methods. Based on this work, a fault diagnosis algorithm is developed for ABFT systems.
[1] K. H. Huang and J. A. Abraham, "Algorithmbased fault tolerance for matrix operations,"IEEE Trans. Comput., vol. C33, pp. 518528, June 1984.
[2] J. Y. Jou and J. A. Abraham, "Faulttolerant matrix arithmetic and signal processing on highly concurrent computing structures,"Proc. IEEE, vol. 74, pp. 732741, May 1986.
[3] A. L. N. Reddy and P. Banerjee, "Algorithmbased fault tolerance for signal processing applications,"IEEE Trans. Comput., vol. C39, pp. 13041308, Oct. 1990.
[4] F. T. Luk and H. Park, "Faulttolerant matrix triangularization on systolic arrays,"IEEE Trans. Comput., vol. C37, pp. 14341438, Nov. 1988.
[5] J. Y. Jou and J. A. Abraham, "Faulttolerant FFT networks,"IEEE Trans. Comput., vol. C37, pp. 548561, May 1988.
[6] YH. Choi and M. Malek, "A faulttolerant FFT processor,"IEEE Trans. Comput., vol. C37, pp. 617621, May 1988.
[7] D. L. Tao, C. R. P. Hartmann, and Y. S. Chen, "A novel concurrent error detection scheme for FFT networks," inProc. Int. Symp. Fault Tolerant Comput., NewcastleuponTyne, U.K., June 1990, pp. 114121.
[8] SJ. Wang and N. K. Jha, "Algorithmbased fault tolerance for FFT networks," inProc. Int. Symp. Circuits Systems, San Diego, May 1992.
[9] YH. Choi and M. Malek, "A faulttolerant systolic sorter,"IEEE Trans. Comput., vol. C37, pp. 621624, May 1988.
[10] P. Banerjeeet al., "Algorithmbased fault tolerance on a hypercube multiprocessor,"IEEE Trans. Comput., vol. C39, pp. 11321145, Sept. 1990.
[11] C. J. Anfinson and F. T. Luk, "A linear algebraic model of algorithmbased fault tolerance,"IEEE Trans. Comput., vol. C37, pp. 15991604, Dec. 1988.
[12] V. S. S. Nair and J. A. Abraham, "Realnumber codes for faulttolerant matrix operations on processor arrays,"IEEE Trans. Comput., vol. C39, pp. 426435, Apr. 1990.
[13] J. Rexford and N. K. Jha, "Algorithmbased fault tolerance for floatingpoint operations in massively parallel systems," inProc. Int. Symp. Circuits Systems, San Diego, May 1992, pp. 649652.
[14] P. Banerjee and J. A. Abraham, "Bounds on algorithmbased fault tolerance in multiple processor systems,"IEEE Trans. Comput., vol. C35, pp. 296306, Apr. 1986.
[15] V. S. S. Nair and J. A. Abraham, "A model for the analysis of faulttolerant signal processing architectures," inProc. Int. Tech. Symp. SPIE, San Diego, Aug. 1988, pp. 246257.
[16] D. Gu, J. Rosenkrantz, and S. S. Ravi, "Design and analysis of test schemes for algorithmbased fault tolerance," inProc. 20th Int. Symp. FaultTolerant Comput., Newcastle, England, June 2628, 1990, pp. 106113.
[17] V. S. S. Nair and J. A. Abraham, "Hierarchical design and analysis of faulttolerant multiprocessor systems using concurrent error detection," inProc. 20th Int. Symp. FaultTolerant Comput., (FTCS20), Newcastle upon Tyne, June 1990, pp. 130137.
[18] B. Vinnakota and N. K. Jha, "Design of multiprocessor systems for concurrent error detection and fault diagnosis," inProc. Int. Symp. Fault Tolerant Comput., Montreal, June 1991.
[19] R. K. Sitaraman and N. K. Jha, "Optimal design of checks for error detection and location in fault tolerant multiprocessor systems," accepted for publication inIEEE Trans. Comput.
[20] B. Vinnakota and N. K. Jha, "A dependence graphbased approach to the design of algorithmbased fault tolerant systems," inProc. Int. Symp. Fault Tolerant Comput., NewcastleuponTyne, U.K., June 1990, pp. 122129.
[21] V. S. S. Nair, Y. V. Hoskote, and J. A. Abraham, "Probabilistic evaluation of online checks in faulttolerant multiprocessor systems,"IEEE Trans. Comput., vol. C41, pp. 532541, May 1992.
[22] F. P. Preparata, G. Metze, and R. T. Chien, "On the connection assignment problem of diagnosable systems,"IEEE Trans. Electronic Comput., vol. EC16, pp. 848857, Dec. 1967.
[23] J. D. Russell and C. R. Kime, "System fault diagnosis: Closure and diagnosability with repair,"IEEE Trans. Comput., vol. C24, pp. 10781088, Nov. 1973.
[24] J. D. Russell and C. R. Kime, "System fault diagnosis: Masking, exposure, and diagnosability without repair,"IEEE Trans. Comput., vol. C24, pp. 11551161, Dec. 1973.
[25] A. T. Dahbura and G. M. Masson, "AnO(n2.5) fault identification algorithm for diagnosable systems,"IEEE Trans. Comput., vol. C33, pp. 486492, June 1984.
[26] P. Banerjee and J. A. Abraham, "Concurrent fault diagnosis in multiple processor systems," inProc. Int. Symp. FaultTolerant Comput., Vienna, July 1986, pp. 298303.
[27] P. Banerjee, "A theory for algorithmbased fault tolerance in array processor systems," (Ph.D. dissertation) Rep. CSG39, Coordinated Sci. Lab., Univ. Illinois at UrbanaChampaign, Dec. 1984.
[28] V. S. S. Nair, "Analysis and design of algorithmbased faulttolerant systems," Ph.D. dissertation, Univ. of Illinois, Urbana, IL, 1990.
[29] Z. Kohavi,Switching and Finite Automata Theory, second edition. New York: McGrawHill, 1978.