
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
A. Arora, M. Gouda, "Closure and Convergence: A Foundation of FaultTolerant Computing," IEEE Transactions on Software Engineering, vol. 19, no. 11, pp. 10151027, November, 1993.  
BibTex  x  
@article{ 10.1109/32.256850, author = {A. Arora and M. Gouda}, title = {Closure and Convergence: A Foundation of FaultTolerant Computing}, journal ={IEEE Transactions on Software Engineering}, volume = {19}, number = {11}, issn = {00985589}, year = {1993}, pages = {10151027}, doi = {http://doi.ieeecomputersociety.org/10.1109/32.256850}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Software Engineering TI  Closure and Convergence: A Foundation of FaultTolerant Computing IS  11 SN  00985589 SP1015 EP1027 EPD  10151027 A1  A. Arora, A1  M. Gouda, PY  1993 KW  faulttolerant computing; legal states; convergence; closure; verification; fault tolerant computing; formal verification VL  19 JA  IEEE Transactions on Software Engineering ER   
The authors formally define what it means for a system to tolerate a class of faults. The definition consists of two conditions. The first is that if a fault occurs when the system state is within the set of legal states, the resulting state is within some larger set and, if faults continue to occur, the system state remains within that larger set (closure). The second is that if faults stop occurring, the system eventually reaches a state within the legal set (convergence). The applicability of the definition for specifying and verifying the faulttolerance properties of a variety of digital and computer systems is demonstrated. Using the definition, the authors obtain a simple classification of faulttolerant systems. Methods for the systematic design of such systems are discussed.
[1] T. Anderson and P. Lee, "Fault tolerance terminology proposals," inProc. FTCS12, 1982, pp. 2933.
[2] A. Arora, "A foundation of faulttolerant computing," Ph.D. dissertation, The University of Texas, Austin, 1992.
[3] A. Arora and M. Gouda, "Closure and convergence: A formulation of faulttolerant computing," inProc. 22nd Int. Symp. FaultTolerant Computing, 1992, pp. 396403.
[4] A. Arora and M. Gouda, "Distributed reset," to be published inIEEE Trans. Comput.; inProc. 10th Conf. Foundations Software Technol. Theoretical Comput. Sci., Lecture Notes in Comput. Sci. 472, (New York: SpringerVerlag), 1990, pp. 316331.
[5] A. Arora, M. Gouda, and G. Varghese, "Distributed constraint satisfaction,"Int. Conf. Distributed Comput. Syst., 1994, submitted for publication.
[6] A. Avizienis, "The fouruniverse information system model for the study of fault tolerance," inProc. 12th Int. Symp. FaultTolerant Computing, 1982, pp. 613.
[7] F. Bastani, I.L. Yen, and I. Chen, "A class of inherently faulttolerant distributed programs,"IEEE Trans. Software Eng., vol. 14, no. 10, pp. 14311442, 1988.
[8] P.A. Bernstein, V. Hadzilacos, and N. Goodman,Concurrency Control and Recovery in Database Systems, AddisonWesley, Reading, Mass., 1987.
[9] Y. Afek and G. Brown, "Selfstabilization of the alternatingbit protocol," inProc. Eighth Symp. Reliable Distributed Syst., 1989, pp. 8083.
[10] M. Breuer and A. Friedman,Diagnosis and Reliable Design of Digital Systems. Computer Science Press, 1976.
[11] J. Burns and J. Pachl, "Uniform selfstabilizing rings,"ACM Trans. Programming Languages Syst., vol. 11, no. 2, pp. 330344, 1989.
[12] K. M. Chandy and J. Misra,Parallel Program Design: A Foundation. Reading, MA: AddisonWesley, 1988.
[13] F. Cristian, "Understanding faulttolerant distributed systems,"Commun. ACM, vol. 34, no. 2, pp. 5678, 1991.
[14] F. Cristian, "A rigorous approach to faulttolerant programming,"IEEE Trans. Software Eng., vol. SE11, no. 1, 1985.
[15] E. Dijkstra, "Selfstabilizing systems in spite of distributed control,"Commun. ACM, vol. 17, pp. 643644, 1974.
[16] E. W. Dijkstra,A Discipline of Programming. Englewood Cliffs, NJ: PrenticeHall, 1976.
[17] E. W. Dijkstra, "Solution of a problem in concurrent programming control,"Commun. ACM, vol. 8, pp. 569569, Sept. 1965.
[18] E. Dijkstra and C. Scholten,Predicate Calculus and Program Semantics. New York: SpringerVerlag, 1990.
[19] P. Ezhilchelvan and S. Shrivastava, "A characterization of faults in systems," inProc. 5th Symp. Reliability Distrib. Software Database Syst., 1986.
[20] M. J. Fischer, N. A. Lynch, and M. S. Paterson, "Impossibility of distributed consensus with one faulty process,"J. ACM, vol. 32, no. 2, pp. 374382, Apr. 1985.
[21] M. Gouda, and N. Multari, "Stabilizing communication protocols,"IEEE Trans. Comput., vol. 40, no. 4, pp. 448458, 1991.
[22] D. Gries,The Science of Programming. New York: SpringerVerlag, 1981.
[23] B.W. Johnson,Design and Analysis of Fault Tolerant Digital Systems, AddisonWesley, Reading, Mass., 1989.
[24] S. Katz and K. Perry, "Selfstabilizing extensions for messagepassing systems," inProc. 9th Ann. Symp. Principles of Distributed Computing, 1990, pp. 91101.
[25] L. Lamport, "Solved problems, unsolved problems, and nonproblems in concurrency," Invited Address, inProc. Third ACM Symp. Principles of Distributed Comput., 1984, pp. 111.
[26] B. Lampson and H. Sturgis, "Crash recovery in a distributed storage system," Tech. Rep., Xerox Palo Alto Research Center, Palo Alto, CA, 1979.
[27] J.C. Laprie, "Dependable computing and fault tolerance: Concepts and terminology," inProc. 15th Int. Symp. FaultTolerant Computing, 1985, pp. 211.
[28] N. Lynch, "A hundred impossibility proofs for distributed computing," invited talk, inProc. 8th Ann. ACM Symp. Principles Distrib. Computing, 1989, pp. 129.
[29] A. Mili,An Introduction to Program FaultTolerance. New York: PrenticeHall, 1990.
[30] C. Mohan, R. Strong, and S. Finkelstein, "Methods for distributed transaction commit and recovery using byzantine agreement within clusters of processes," inProc. 2nd ACM Symp. Principles Distrib. Computing, 1983, pp. 2943.
[31] J. von Neumann, "Probabilistic logics and the synthesis of reliable organisms from unreliable components," inAutomata Studies. Princeton University Press, 1956, pp. 4398.
[32] R. D. Schlichting and F.B. Schneider, "Failstop processors: An approach to designing faulttolerant computing systems,"ACM Trans. Comput. Syst., vol. 1, no. 3, pp. 222238, Aug. 1983.
[33] M. Schneider, "SelfStabilization,"ACM Comput. Surveys, vol. 25, no. 1, pp. 4567, 1993.
[34] C. Seitz, "System timing," inIntroduction to VLSI Systems. AddisonWesley, 1980.
[35] D. Siewiorek, "Architecture of faulttolerant computers," inFaultTolerant Computing(vol. II). New York: PrenticeHall, 1986.
[36] D. Skeen and M. Stonebraker, "A formal model of crash recovery in a distributed system,"IEEE Trans. Software Eng., pp. 219228, 1983.
[37] T. Srikanth and S. Toeug, "Simulating authenticated broadcast to derive simple fault tolerant algorithms,"Distrib. Computing, vol. 2, no. 2, pp. 8094, 1987.
[38] B. Randell, "System structure for software fault tolerance,"IEEE Trans. Software Eng., pp. 220232, 1975.
[39] A. S. Tanenbaum,Computer Networks, Englewood Cliffs, NJ: PrenticeHall, 1981.
[40] I.L. Yen, F. Bastani, and E. Leiss, "An inherently faulttolerant sorting algorithm," inProc. 5th Int. Parallel Process, Symp., 1991, pp. 3742.
[41] Y. Zhao and F. Bastani, "A selfadjusting algorithm for byzantine agreement,"Distributed. Comput., vol. 5, pp. 219226, 1992.