This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Nest: A Nested-Predicate Scheme for Fault Tolerance
November 1993 (vol. 42 no. 11)
pp. 1303-1324

Introduces a nested-predicate scheme for fault tolerance, called Nest. Nest provides a formal comprehensive model for fault-tolerant parallel algorithms and a general methodology for designing reliable applications for multiprocessor systems. The model relies on the formalization of concepts for fault tolerance by means of three nested system predicates and on properties ruling their interrelationships. This rigorous framework facilitates the study of the specific properties that enable an algorithm to tolerate faults. The consequence of that is the outline of systematic design techniques that can be used to add fault tolerance properties to algorithms while preserving their functional characteristics.

[1] M. Malek, "Responsive systems: A challenge for the nineties," inProc. Euromicro 90, 16th Symp. Microprocessing and Microprogramming, Keynote Address, Amsterdam, The Netherlands, Microprocessing and Microprogramming 30, Aug. 1990, pp. 9-16.
[2] A. Hall, "Seven myths of formal methods,"IEEE Software, pp. 11-19, Sept. 1990.
[3] L. Laranjeira, M. Malek, and R. Jenevein, "On tolerating faults in naturally redundant algorithms," inProc. 10th Symp. Reliable Distributed Systems, Pisa, Italy, Sept. 1991, pp. 118-127.
[4] F. Cristian, "A rigorous approach for fault-tolerant programming,"IEEE Trans. Software Eng., vol. SE-11, no. 1, pp. 23-31, Jan. 1985.
[5] K. M. Chandy and J. Misra,Parallel Program Design: A Foundation. Reading, MA: Addison-Wesley, 1988.
[6] A. Arora and M. Gouda, "Closure and convergence: A formulation of fault-tolerant computing," inProc. 22nd Int. Symp. Fault-Tolerant Computing, 1992, pp. 396-403.
[7] L. Zhiming and M. Joseph, "Transformation of programs for fault tolerance,"Formal Aspects of Computing, vol. 4, pp. 442-469, 1992.
[8] S. Katz and K. J. Perry, "Self-stabilizing extensions for message-passing systems," inProc. MCC Workshop Self-Stabilization, Nov. 10, 1989, pp. 1-27.
[9] J. C. Browne, A. Emerson, M. Gouda, D. Miranker, A. Mok, and L. Rosier, "Bounded-time fault-tolerant rule-based systems,"Telematics Informatics, vol. 7, nos. 33/4, pp. 441-454, 1990.
[10] R. J. Back, "Refining atomicity in parallel algorithms," Abo Akademi, Tech. Rep. 57, 1988.
[11] L. Valiant, "A Bridging Model for Parallel Computation,"Comm. ACM, Vol. 33, No. 8, Aug. 1990, pp. 103-111.
[12] K. M. Chandy and L. Lamport, "Distributed snapshots: Determining global states of distributed systems,"ACM Trans. Comput. Syst., vol. 3, no. 1, pp. 63-75, Feb. 1985.
[13] J. Turek and D. Shasha, "The many faces of consensus in distributed systems,"Computer, pp. 8-17, June 1992.
[14] M. Malek, "A consensus-based framework for responsive computer system design," inProc. NATO Advanced Study Institute on Real-Time Systems, St. Martin, West Indies, Oct. 5-18, 1992.
[15] J. von Neumann, "Probabilistic logics and the synthesis of reliable organisms from unreliable components,"Automata Studies, C.E. Shannon and J. McCarthy, Eds. Princeton, NJ: Princeton University Press, 1956, pp. 43-98.
[16] R. Koo and S. Toueg, "Checkpointing and rollback-recovery for distributed systems,"IEEE Trans. Software Eng., vol. SE-13, pp. 23-31, Jan. 1987.
[17] K. H. Huang and J. A. Abraham, "Algorithm-based fault tolerance for matrix operations,"IEEE Trans. Software Eng., vol. SE-10, no. 6, pp. 518-528, June 1984.
[18] E. Dijkstra, "Self-stabilizing systems in spite of distributed control,"Commun. ACM, vol. 17, pp. 643-644, 1974.
[19] F. B. Bastani, I. Yen, and I. Chen, "A class of inherently fault-tolerant distributed programs,"IEEE Trans. Software Eng., vol. 14, no. 10, pp. 1432-1442, Oct. 1988.
[20] A. Mourad, "Fault-tolerant parallel algorithms design," Master's thesis, Dept. Elec. and Comput. Eng., Univ. of Texas, Austin, Nov. 1989.
[21] J. G. Kuhl and S. M. Reddy, "Fault diagnosis in fully distributed systems," inProc. 11th Fault-Tolerant Computing Symp., June 1981, pp. 100-105.
[22] K. A. Hua and J. A. Abraham, "Design of Systems with Concurrent Error Detection Using Software Redundancy,"Proc. ACM/IEEE Fall Joint Computer Conf., Dallas, TX, Nov. 1986, pp. 826-835.
[23] V. Balasubramanian and P. Banerjee, "Tradeoffs in the design of efficient algorithm-based error detection schemes for hypercube multiprocessors,"IEEE Trans. Software Eng., vol. 16, no. 2, Feb. 1990.
[24] M. Barborak, M. Malek, and A. Dahbura, "The consensus problem in fault-tolerant computing,"ACM Computing Surveys, vol. 25, no. 2, pp. 171-220, June 1993.
[25] M.A. Schuette and J.P. Shen, "Exploiting Instruction-Level Resource Parallelism for Transparent, Integrated Control Flow Monitoring,"Proc. IEEE Fault-Tolerant Computing Symp. 21, 1991, pp. 318-325.
[26] A. Dahbura, "System-level diagnosis: A perspective for the third decade," AT&T Bell Labs. Rep., inConcurrent Computations: Algorithms, Architecture, and Technology, S. Tewksbury, B. Dickinson, S. Schwartz, Eds. New York: Plenum, 1988.
[27] D. Fussell and S. Rangarajan, "Probabilistic diagnosis of multiprocessor systems with arbitrary connectivity," inProc. 19th Int. Symp. Fault Tolerant Comput., 1989, pp. 560-565.
[28] L. Laranjeira, M. Malek, and R. Jenevein, "Space-time overhead analysis and experiments with techniques for fault tolerance," inProc. 3rd IFIP Working Conf. Dependable Critical Applications, Palermo, Italy, Sept. 1992, pp. 175-184.
[29] E. J. H. Chang, G.H. Gannet, and D. Rotem, "On the costs of self-stabilization,"Inform. Process. Lett., vol. 24, pp. 311-316, 1987.

Index Terms:
nested-predicate scheme; fault tolerance; Nest; parallel algorithms; multiprocessor systems; formalization; nested system predicates; rigorous framework; cost/benefit comparison; design methodology; natural redundancy; fault tolerant computing; parallel algorithms; redundancy.
Citation:
L.A. Laranjeira, M. Malek, R. Jenevein, "Nest: A Nested-Predicate Scheme for Fault Tolerance," IEEE Transactions on Computers, vol. 42, no. 11, pp. 1303-1324, Nov. 1993, doi:10.1109/12.247836
Usage of this product signifies your acceptance of the Terms of Use.