This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fault Tolerance in Multiprocessor Systems Without Dedicated Redundancy
March 1988 (vol. 37 no. 3)
pp. 358-362
An algorithm called RAFT (recursive algorithm for fault tolerance) for achieving fault tolerance in multiprocessor systems is described. Through the use of a combination of dynamic space- and time- redundancy techniques, RAFT achieves fault tolerance in the presence of permanent as well as intermittent faults. Performance and reliability of multiprocessor systems using RAFT are determined as a

[1] D. P. Siewiorek and R. S. Swarz,The Theory and Practice of Reliable System Design, Bedford, MA: Digital, 1982.
[2] P. Lala,Fault-tolerant and fault testable hardware design. Englewood Cliffs, NJ, Prentice-Hall, 1985.
[3] P. Agrawal, "RAFT: A recursive algorithm for fault tolerance," inProc. Int. Conf. Parallel Processing, St. Charles, IL, Aug. 20-23, 1985, pp. 814-821.
[4] P. Agrawal, "A novel fault tolerant distributed system architecture," inProc. IEEE Int. Conf. Comput. Design, Port Chester, NY, Oct. 7- 10, 1985, pp. 760-763.
[5] K. Y. Chwa and S. L. Hakimi, "Schemes for fault tolerant computing: A comparison of modularly redundant and t-diagnosable systems,"Inform. Contr., vol. 49, pp. 212-238, June 1981.
[6] B. Koenemann, J. Mucha, and G. Zwiehoff, "Built-in logic block observation techniques," inProc. IEEE Int. Test Conf., Cherry Hill, NJ, Oct. 1979, pp. 37-41.
[7] Y. Tamir and C. H. Sequin, "Error recovery in multiplecomputers using global checkpoints," inProc. 13th Int. Conf. Parallel Processing, Bellaire, MI, Aug. 1984, pp. 32-41.
[8] F. P. Mathur and A. Avizienis, "Reliability analysis and architecture of a hybrid-redundant digital system: Generalized triple modular redundancy with self-repair," inProc. AFIPS SJCC, Vol. 36, Montvale, NJ: AFIPS Press, 1970, pp. 375-83.
[9] X. Castillo, S. R. McConnel, and D. P. Siewiorek, "Derivation and calibration of a transient error reliability model,"IEEE Trans. Comput., vol. C-31, pp. 658-671, July 1982.
[10] P. Agrawal and R. Agrawal, "Software implementation of a recursive fault tolerance algorithm on a network of computers," inProc. 13th Int. Symp. Comput. Architecture, Tokyo, Japan, June 1986, pp. 65- 72.

Index Terms:
fault tolerance; dynamic space redundancy; multiprocessor systems; RAFT; recursive algorithm for fault tolerance; time- redundancy techniques; triple modular redundancy; fault tolerant computing; multiprocessing systems.
Citation:
P. Agrawal, "Fault Tolerance in Multiprocessor Systems Without Dedicated Redundancy," IEEE Transactions on Computers, vol. 37, no. 3, pp. 358-362, March 1988, doi:10.1109/12.2174
Usage of this product signifies your acceptance of the Terms of Use.