This Article 
 Bibliographic References 
 Add to: 
Designing Masking Fault-Tolerance via Nonmasking Fault-Tolerance
June 1998 (vol. 24 no. 6)
pp. 435-450

Abstract—Masking fault-tolerance guarantees that programs continually satisfy their specification in the presence of faults. By way of contrast, nonmasking fault-tolerance does not guarantee as much: it merely guarantees that when faults stop occurring, program executions converge to states from where programs continually (re)satisfy their specification. We present in this paper a component based method for the design of masking fault-tolerant programs. In this method, components are added to a fault-intolerant program in a stepwise manner, first, to transform the fault-intolerant program into a nonmasking fault-tolerant one and, then, to enhance the fault-tolerance from nonmasking to masking. We illustrate the method by designing programs for agreement in the presence of Byzantine faults, data transfer in the presence of message loss, triple modular redundancy in the presence of input corruption, and mutual exclusion in the presence of process fail-stops. These examples also serve to demonstrate that the method accommodates a variety of fault-classes. It provides alternative designs for programs usually designed with extant design methods, and it offers the potential for improved masking fault-tolerant programs.

[1] F. Bastani, I.-L. Yen, and I. Chen, "A Class of Inherently Fault-Tolerant Diffusing Programs," IEEE Trans. Software Eng., vol. 14, pp. 1,432-1,442, 1988.
[2] D. Peled and M. Joseph, "A Compositional Framework for Fault Tolerance by Specification Transformation," Theoretical Computer Science, vol. 128, pp. 99-125, 1994.
[3] B. Randell, "System Structure for Software Fault Tolerance," IEEE Trans. Software Eng., pp. 220-232, 1975.
[4] J.-C. Laprie, "Dependable Computing and Fault Tolerance: Concepts and Terminology," Proc. 15th Int'l Symp. Fault-Tolerant Computing, pp. 2-11, 1985.
[5] A. Arora and M.G. Gouda, “Closure and Convergence: A Foundation of Fault-Tolerant Computing,” IEEE Trans. Software Eng., vol. 19, no. 11, pp. 1,015–1,027, 1993.
[6] K.M. Chandy and J. Misra, Parallel Program Design—A Foundation. Addison-Wesley, 1988.
[7] B. Alpern and F.B. Schneider, "Defining Liveness," Information Processing Letters, vol. 21, no. 4, pp. 181-185, Oct. 1985.
[8] E.W. Dijkstra, A Discipline of Programming.Englewood Cliffs, N.J.: Prentice Hall, 1976.
[9] D. Gries, The Science of Programming.New York, Heidelberg, Berlin: Springer-Verlag, 1981.
[10] B. Alpern and F.B. Schneider, "Proving Boolean Combination of Deterministic Properties," Proc. Second Symp. Logic in Computer Science, pp. 131-137, 1987.
[11] A. Arora, M.G. Gouda, and G. Varghese, "Constraint Satisfaction as a Basis for Designing Nonmasking Fault-Tolerance," J. High Speed Networks, vol. 5, no. 3, pp. 293-306, 1996.
[12] A. Arora and S. Kulkarni, “Component-Based Design of Multitolerant Systems,” IEEE Trans. Software Eng., vol. 24, no. 1, pp. 63–78, Jan. 1998.
[13] G. Tel, "Structure of Distributed Algorithms," PhD thesis, Univ. of Utrecht; also published by Cambridge Univ. Press, 1989.
[14] L. Lamport, R. Shostak, and M. Pease, "The Byzantine Generals Problem," ACM Trans. Programming Languages and Systems, vol. 4, no. 3, July 1982, pp. 382-401.
[15] S.S. Kulkarni and A. Arora, "Compositional Design of Multitolerant Repetitive Byzantine Agreement," Proc. 17th Int'l Conf. Foundations of Software Technology and Theoretical Computer Science,Kharagpur, India, pp. 169-183, Dec. 1997.
[16] E. Dijkstra and C. Scholten, Predicate Calculus and Program Semantics. Springer-Verlag, 1989.
[17] K. Raymond, “A Tree-Based Algorithm for Distributed Mutual Exclusion,” ACM Trans. Computer Systems, vol. 7, no. 1, pp. 61-77, Feb. 1989.
[18] J.L.A. van de Snepscheut, "Fair Mutual Exclusion on a Graph of Processes," Distributed Computing, vol. 2, no. 2, pp. 113-115, 1987.
[19] A. Arora, "Efficient Reconfiguration of Trees: A Case Study in the Methodical Design of Nonmasking Fault-Tolerance," Proc. Third Int'l Symp. Formal Techniques in Real Time aand Fault-Tolerance, pp. 110-127, 1994. Science of Computer Programming, to appear.
[20] A. Arora and S.S. Kulkarni, "Designing Masking Fault-Tolerance via Nonmasking Fault-Tolerance," Proc. 14th Symp. Reliable Distributed Systems, Bad Neuenahr, vol. 14, pp. 174-185, 1995.
[21] F.B. Schneider, "Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial," ACM Computing Surveys, vol. 22, no. 4, pp. 299-319, Dec. 1990.
[22] D.M. Dhamdhere and S.S. Kulkarni, "A Token Based k resilient Mutual Exclusion Algorithm for Distributed Systems," Information Processing Letters, vol. 50, pp. 151-157, 1994.
[23] Y.I. Chang, M. Singhal, and M.T. Liu, "A Fault Tolerant Algorithm for Distributed Mutual Exclusion," IEEE CS Press, pp. 146-154, 1990.
[24] D. Agrawal and A.E. Abbadi, An Efficient and Fault-Tolerant Solution for Distributed Mutual Exclusion ACM Trans. Computing Systems, vol. 9, no. 1, pp. 1-20, 1991.
[25] S.S. Kulkarni and A. Arora, "Multitolerance in distributed Reset," Chicago J. Theoretical Computer Science, special issue on Self-Stabilization, 1998, to appear.
[26] S.S. Kulkarni and A. Arora, "Multitolerant Barrier Synchronization," Information Processing Letters, vol. 64, no. 1, pp. 29-36, Oct. 1997.

Index Terms:
Masking and nonmasking fault-tolerance, component based design, correctors, detectors, stepwise design formal methods, distributed systems.
Anish Arora, Sandeep S. Kulkarni, "Designing Masking Fault-Tolerance via Nonmasking Fault-Tolerance," IEEE Transactions on Software Engineering, vol. 24, no. 6, pp. 435-450, June 1998, doi:10.1109/32.689401
Usage of this product signifies your acceptance of the Terms of Use.