
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
V. Vinnakota, N.K. Jha, "Design of AlgorithmBased FaultTolerant Multiprocessor Systems for Concurrent Error Detection and Fault Diagnosis," IEEE Transactions on Parallel and Distributed Systems, vol. 5, no. 10, pp. 10991106, October, 1994.  
BibTex  x  
@article{ 10.1109/71.313125, author = {V. Vinnakota and N.K. Jha}, title = {Design of AlgorithmBased FaultTolerant Multiprocessor Systems for Concurrent Error Detection and Fault Diagnosis}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {5}, number = {10}, issn = {10459219}, year = {1994}, pages = {10991106}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.313125}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Parallel and Distributed Systems TI  Design of AlgorithmBased FaultTolerant Multiprocessor Systems for Concurrent Error Detection and Fault Diagnosis IS  10 SN  10459219 SP1099 EP1106 EPD  10991106 A1  V. Vinnakota, A1  N.K. Jha, PY  1994 KW  Index Termsfault tolerant computing; reliability; multiprocessing systems; fault location; parallelarchitectures; system recovery; faulttolerant multiprocessor systems; algorithmbasedmultiprocessor systems; concurrent error detection; fault diagnosis; algorithmbased faulttolerance; lowoverhead systemlevel error detection; fault location scheme; ABFTsystems; design procedure; data element sharing; ABFT system design VL  5 JA  IEEE Transactions on Parallel and Distributed Systems ER   
Algorithmbased fault tolerance (ABPT) is a lowoverhead systemlevel concurrent errordetection and fault location scheme for multiprocessor systems. We present new methodsfor the design of ABFT systems. Our design procedure is applicable to a wide range ofsystems in which processors share data elements. A feature of our design approach isthat the type of checks to be used in the final system can be controlled by the systemdesigner. We also present some new bounds on the number of checks needed in ABFTsystem design.
[1] K. H. Huang and J. A. Abraham, "Algorithmbased fault tolerance for matrix operations,"IEEE Trans. Comput., vol. C33, pp. 518528, June 1984.
[2] J. Y. Jou and J. A. Abraham, "Faulttolerant matrix arithmetic and signal processing on highly concurrent computing structures,"Proc. IEEE, vol. 74, pp. 732741, May 1986.
[3] A. L. N. Reddy and P. Banerjee, "Algorithmbased fault tolerance for signal processing applications,"IEEE Trans. Comput., vol. 39, pp. 13041308, Oct. 1990.
[4] F.T. Luk and H. Park, "Faulttolerant matrix triangularization on systolic arrays,"IEEE Trans. Comput., vol. 37, pp. 14341438, Nov. 1988.
[5] J.Y. Jou and J.A. Abraham, "Faulttolerant FFT networks,"IEEE Trans. Comput., vol. 37, pp. 548561, May 1988.
[6] Y.H. Choi and M. Malek, "A faulttolerant FFT processor,"IEEE Trans. Comput., vol. 37, pp. 617621, May 1988.
[7] D. L. Tao, C. R. P. Hartmann, and Y. S. Chen, "A novel concurrent error detection scheme for FFT networks," inProc. Int. Symp. Fault Tolerant Comput., NewcastleuponTyne, U.K., June 1990, pp. 114121.
[8] SJ. Wang and N. K. Jha, "Algorithmbased fault tolerance for FFT networks," inProc. Int. Symp. Circuits Systems, San Diego, May 1992.
[9] Y.H. Choi and M. Malek, "A faulttolerant systolic sorter,"IEEE Trans. Comput., vol. 37, pp. 621624, May 1988.
[10] B. Vinnakota and N. K. Jha, "A dependence graphbased approach to the design of algorithmbased fault tolerant systems," inProc. Int. Symp. Fault Tolerant Comput., NewcastleuponTyne, U.K., June 1990, pp. 122129.
[11] C. J. Anfinson and F. T. Luk, "A linear algebraic model of algorithmbased fault tolerance,"IEEE Trans. Comput., vol. 37, pp. 15991604, Dec. 1988.
[12] V. S. S. Nair and J. A. Abraham, "Realnumber codes for faulttolerant matrix operations on processor arrays,"IEEE Trans. Comput., vol. 39, pp. 426435, Apr. 1990.
[13] J. Rexford and N. K. Jha, "Algorithmbased fault tolerance for floatingpoint operations in massively parallel systems," inProc. Int. Symp. Circuits Systems, San Diego, May 1992, pp. 649652.
[14] P. Banerjee and J. A. Abraham, "Bounds on algorithmbased fault tolerance in multiple processor systems,"IEEE Trans. Comput., vol. C35, pp. 296306, Apr. 1986.
[15] V.S.S. Nair and J.A. Abraham, "A model for the analysis of faulttolerant signal processing architectures," inProc. 32nd Int. Tech. Symp. SPIE, 1988, pp. 246257.
[16] V.S.S. Nair and J.A. Abraham, "Probabilistic evaluation of online checks in faulttolerant multiprocessor systems,"IEEE Trans. Comput., vol. 41, pp. 532541, May 1992.
[17] D. J. Rosenkrantz and S.S. Ravi, "Improved upper bounds for algorithmbased fault tolerance," inProc. 26th Allerton Conf. Comm., Control and Computing, 1988, pp. 388397.
[18] D. Gu, J. Rosenkrantz, and S. S. Ravi, "Design and analysis of test schemes for algorithmbased fault tolerance," inProc. 20th Int. Symp. FaultTolerant Comput., Newcastle, England, June 2628, 1990, pp. 106113.
[19] V. S. S. Nair and J. A. Abraham, "A model for the analysis, design and comparison of faulttolerant WSI architectures," inProc. Workshop on Wafer Scale Integration, Coma, Italy, June 1989.
[20] V. S. S. Nair and J. A. Abraham, "Hierarchical design and analysis of faulttolerant multiprocessor systems using concurrent error detection," inProc. 20th Int. Symp. FaultTolerant Comput., (FTCS20), Newcastle upon Tyne, June 1990, pp. 130137.
[21] R. K. Sitaraman and N. K. Jha, "Optimal design of checks for error detection and location in faulttolerant multiprocessor systems,"IEEE Trans. Comput., vol. 42, pp. 780793, July 1993.
[22] B. Vinnakota and N. K. Jha, "Diagnosability and diagnosis of algorithmbased faulttolerant systems,"IEEE Trans. Comput., vol. 42, pp. 924937, Aug. 1993.
[23] F. P. Preparata, G. Metze, and R. T. Chien, "On the connection assignment problem of diagnosable systems,"IEEE Trans. Electron. Comput., vol. EC16, pp. 848857, Dec. 1967.
[24] A.T. Dahbura and G. M. Masson, "AnO(n2.5)fault identification algorithm for diagnosable systems,"IEEE Trans. Comput., vol. C33, pp. 486492, June 1984.
[25] N. H. Vaidya and D. K. Pradhan, "System level diagnosis: Combining detection and location," inDig. of papers: The 21st Int. Symp. FaultTolerant Comput., 1991, pp. 488495.