|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Shalini Yajnik, Niraj K. Jha, "Analysis and Randomized Design of Algorithm-Based Fault Tolerant Multiprocessor Systems Under an Extended Model," IEEE Transactions on Parallel and Distributed Systems, vol. 8, no. 7, pp. 757-768, July, 1997. | |||
| BibTex | x | ||
| @article{ 10.1109/71.598349, author = {Shalini Yajnik and Niraj K. Jha}, title = {Analysis and Randomized Design of Algorithm-Based Fault Tolerant Multiprocessor Systems Under an Extended Model}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {8}, number = {7}, issn = {1045-9219}, year = {1997}, pages = {757-768}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.598349}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Parallel and Distributed Systems TI - Analysis and Randomized Design of Algorithm-Based Fault Tolerant Multiprocessor Systems Under an Extended Model IS - 7 SN - 1045-9219 SP757 EP768 EPD - 757-768 A1 - Shalini Yajnik, A1 - Niraj K. Jha, PY - 1997 KW - Algorithm-based fault tolerance KW - concurrent error detection KW - concurrent fault location KW - randomized algorithms KW - fault diagnosis KW - transient faults. VL - 8 JA - IEEE Transactions on Parallel and Distributed Systems ER - | |||
Abstract—Reliability of compute-intensive applications can be improved by introducing fault tolerance into the system. Algorithm-based fault tolerance (ABFT) is a low-cost scheme which provides the required fault tolerance to the system through system level encoding. In this paper, we propose randomized construction techniques, under an extended model, for the design of ABFT systems with the required fault tolerance capability. The model considers failures in the processors performing the checking operations.
[1] K.H. Huang and J.A. Abraham, "Algorithm-Based Fault Tolerance for Matrix Operations," IEEE Trans. Computers, vol. 33, no. 6, pp. 518-528, June 1984.
[2] J.Y. Jou and J.A. Abraham, "Fault Tolerant Matrix Arithmetic and Signal-Processing on Highly Concurrent Computing Structures," Proc. IEEE, vol. 74, pp. 732-741, May 1986.
[3] P. Banerjee and J.A. Abraham, "Bounds on Algorithm-Based Fault Tolerance in Multiple Processor Systems," IEEE Trans. Computers, Apr. 1986, pp. 296-306.
[4] V.S.S. Nair and J.A. Abraham, "A Model for the Analysis, Design and Comparison of Fault-Tolerant WSI Architectures," Proc. Workshop Wafer Scale Integration,Como, Italy, June 1989.
[5] S. Yajnik and N.K. Jha, "Design of Algorithm-Based Fault Tolerant Systems with In-System Checks," Proc. Int'l Conf. Parallel Processing,St. Charles, Ill., Aug. 1993.
[6] S. Yajnik and N.K. Jha, “Graceful Degradation in Algorithm-Based Fault Tolerant Multiprocessor Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 2, pp. 137-153, Feb. 1997.
[7] R. Sitaraman and N.K. Jha, "Optimal Design of Checks for Error Detection and Location in Fault Tolerant Multiprocessor Systems," IEEE Trans. Computers, vol. 42, no. 7, pp. 780-793, July 1993.
[8] A. Roy-Chowdhury and P. Banerjee, “Tolerance Determination for Algorithm-Based Checks Using Simplified Error Analysis Techniques,” Proc. 23rd IEEE Fault-Tolerant Computing Symp. (FTCS-23), pp. 290-298, June 1993.
[9] P. Banerjee and J.A. Abraham, "Concurrent Fault Diagnosis in Multiple Processor Systems," Proc. Int'l Symp. Fault-Tolerant Computing, pp. 298-303,Vienna, June 1986.
[10] V.S.S. Nair and J.A. Abraham, "A Model for the Analysis of Fault Tolerant Signal Processing Architectures," Proc. Int'l Technical Symp. SPIE, pp. 246-257,San Diego, Aug. 1988.
[11] B. Vinnakota and N.K. Jha, "Diagnosability and Diagnosis of Algorithm-Based Fault Tolerant Systems," IEEE Trans. Computers, vol. 42, no. 8, pp. 924-937, Aug. 1993.
[12] B. Vinnakota, "Analysis, Design and Synthesis of Algorithm-Based Fault Tolerant Systems," PhD. thesis, Dept. of Electrical Eng., Princeton Univ., Oct. 1991.
[13] V.S.S. Nair and J.A. Abraham, “Hierarchical Design and Analysis of Fault-Tolerant Multiprocessor Systems Using Concurrent Error Detection,” Proc. 20th IEEE Fault-Tolerant Computing Symp. (FTCS-20), pp. 130-137, 1990.
[14] B. Vinnakota and N.K. Jha, “Design of Multiprocessor Systems for Concurrent Error Detection and Fault Diagnosis,” Proc. 21st IEEE Fault-Tolerant Computing Symp. (FTCS-21), pp. 504-511, 1991.
[15] J.A. Abraham et al., "Fault Tolerance Techniques for Systolic Arrays," Computer, pp. 65-74, July 1987.
[16] D. Gu, D.J. Rosenkrantz, and S.S. Ravi, “Design and Analysis of Test Schemes for Algorithm-Based Fault Tolerance,” Proc. 20th IEEE Fault-Tolerant Computing Symp. (FTCS-20), pp. 106-113, 1990.
[17] D.J. Rosenkrantz and S.S. Ravi, "Improved Bounds on Algorithm-Based Fault Tolerance," Proc. Ann. Allerton Conf. Comm., Cont. and Computers, pp. 388-397,Allerton, Ill., Sept. 1988.
[18] B. Vinnakota and N.K. Jha, "A Dependence Graph-Based Approach to the Design of Algorithm-Based Fault Tolerant Systems," Proc. Int'l Symp. Fault-Tolerant Computing, pp. 122-129,Newcastle-upon-Tyne, U.K., June 1990.
[19] F.T. Luk and H. Park, "An Analysis of Algorithm-Based Fault Tolerance Techniques," Proc. SPIE Advanced Algorithms, Architecture, and Signal Processing, vol. 696, pp. 222-228, Aug. 1986.
[20] D.M. Blough and A. Pelc, "Almost Certain Fault Diagnosis through Algorithm-Based Fault Tolerance," Technical Report ECE-92-09, Dept. of Electrical and Computer Eng., Univ. of California, Irvine.
[21] R.K. Iyer and D.J. Rossetti, "Permanent CPU Errors and Systems Activity: Measurement and Modeling," Proc. Real-Time Systems Symp., pp. 61-72,Arlington, Va., Dec. 1983.

