This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
New Hybrid Fault Models for Asynchronous Approximate Agreement
April 1996 (vol. 45 no. 4)
pp. 439-449

Abstract—An important problem in fault-tolerant distributed systems is maintaining agreement between nonfaulty processes in the presence of undiagnosed faults. To achieve agreement, processes exchange their local "opinions" of a particular value, and then vote on the values received to arrive at a "consensus." Approximate Agreement defines a condition in which it is not necessary for consensus values to be identical. Rather, it is only necessary that they agree to within a predefined tolerance.

Approximate Agreement can be achieved through a sequence of convergent voting rounds, in which the range of values held by nonfaulty nodes is reduced in each round. Recent research has revealed simple expressions for the convergence rate and fault tolerance of a broad family of convergent voting algorithms called Mean-Subsequence-Reduced (MSR) algorithms. These results were derived under the Thambidurai and Park hybrid fault model comprised of asymmetric, symmetric, and benign faults. However, these results apply only to synchronous systems, in which there is a known finite bound on computation and communications times. This paper extends the previous results to asynchronous systems, in which no such bound exists. In addition, we introduce two new hybrid fault models which further differentiate between omissive faults and transmissive faults. The new fault models permit tighter bounds on the fault-tolerance of asynchronous systems to be derived.

[1] K.W. Anderson and D.W. Hall,Sets, Sequences, and Mappings: The Basic Concepts of Analysis.New York: Wiley, 1963.
[2] O. Babaoglu and R. Drummond, and P. Stephenson,"The Impact of Communication Network Properties on Reliable Broadcast Protocols," Proc. 16th Int'l. Symp. Fault-Tolerant Computing, pp. 212-217, July 1986.
[3] J. Bartlett,"A Non-Stop Operating System," Proc. Hawaii Int'l Conf. System Sciences, pp. 103-119, 1978.
[4] M.A. Barborak, M. Malek, and A.T. Dahbura, "The Consensus Problem in Fault-Tolerant Computing," ACM Computer Surveys, vol. 25, pp. 171-220, June 1993.
[5] K. Birman, "Replication and Fault-Tolerance in the ISIS System," Proc. 10th ACM Symp. Operating Systems Principles, pp. 79-86, Dec. 1985.
[6] F. Cristian,H. Aghili,R. Strong, and D. Dolev,"Atomic Broadcast: From Simple Message Diffusion To Byzantine Agreement," Proc. 15th Int'l. Symp. on Fault-Tolerant Computing, pp. 200-206, June 1985.
[7] F. Cristian,H. Aghili, and R. Strong,"Clock Synchronization in the Presence of Omission and Performance Faults, and Processor Joins," 16th Int'l. Conf. Fault-Tolerant Computing,Vienna, Austria, 1986.
[8] F. Cristian,D. Dolev,R. Strong, and H. Aghili,"Atomic Broadcast in a Real-Time Environment," IBM Research Report, Almaden Research Center, 1989.
[9] D. Dolev,N.A. Lynch,S.S. Pinter,E.W. Stark, and W.E. Weihl,"Reaching Approximate Agreement in the Presence of Faults," Proc. Third Symp. Reliability in Distributed Software and Database Systems, Oct. 1983.
[10] D. Dolev and R. Strong,"Authenticated Algorithms for Byzantine Agreement," SIAM J. Computing, vol. 12, no. 4, 1983.
[11] D. Dolev et al., "Reaching Approximate Agreement in the Presence of Faults," J. ACM, July 1986, pp. 499-516.
[12] D.L. Gilles., (Moderator), "Civil Aircraft System Safety Panel: The Regulations and Design Assessment," Proc. IEEE Ann. Reliability and Maintainability Symp., pp. 296-299, 1985.
[13] D.C.E Homes,"Global System Data Bus Using the Digital Autonomous Terminal Access Communication Protocol," Proc. IEEE/AIAA Seventh Digital Avionics Systems Conf. (DASC), pp. 227-233, Oct. 1986,.
[14] B.W. Johnson, Design and Analysis of Fault-Tolerant Digital Systems, pp. 394-402. Reading, Mass.: Addison-Wesley, June 1989.
[15] J.L.W. Kessels,"Two Designs of a Fault-Tolerant Clocking System," IEEE Trans. Computers, vol. 33, no. 10, pp. 912-919, Oct. 1984.
[16] R.M. Kieckhafer,C.J. Walter,A.M. Finn, and P.M. Thambidurai,"The MAFT Architecture for Distributed Fault-Tolerance," IEEE Trans. Computers, vol. 37, no. 4, pp. 398-405, Apr. 1988.
[17] R.M. Kieckhafer and M.H. Azadmanesh,"Low Cost Approximate Agreement in Partially Connected Networks," J. Computing and Information, vol. 3, no. 1, pp. 53-85, 1993.
[18] R.M. Kieckhafer and M.H. Azadmanesh,"Reaching Approximate Agreement with Mixed Mode Faults," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 1, pp. 53-63, Jan. 1994.
[19] C. M. Krishna, K. G. Shin, and R. W. Butler,“Ensuring fault tolerance of phase-locked clocks,”IEEE Trans. Comput., vol. C-34, pp. 752–756, Aug. 1985.
[20] L. Lamport, R. Shostak, and M. Pease, "The Byzantine Generals Problem," ACM Trans. Programming Languages and Systems, vol. 4, no. 3, July 1982, pp. 382-401.
[21] L. Lamport and P.M. Melliar-Smith, “Synchronizing Clocks in the Presence of Faults,” J. ACM, vol. 32, no. 1, pp. 52–78, Jan. 1985.
[22] P. Lincoln and J. Rushby, “Formal Verification of an Agorithm for Interactive Consistency Under a Hybrid Fault Model,” Proc. Computer-Aided Verification, CAV '93, pp. 292–304, C. Courcoubetis, ed., Elounda, Greece, Lecture Notes in Computer Science 697, Springer-Verlag, June/July 1993.
[23] C.L. Liu,Elements of Discrete Mathematics, 2nd ed. New York: McGraw-Hill, 1985.
[24] J. Lundelius and N. Lynch,"A New Fault-Tolerant Algorithm for Clock Synchronization," Third Symp. Principles of Distributed Computing, pp. 75-87, Aug. 1984.
[25] F.J. Meyer and D.K. Pradhan,"Consensus with Dual Failure Modes," Proc. 17th Fault-Tolerant Computing Symp., pp. 48-54, July 1987.
[26] M. Pease, R. Shostak, and L. Lamport, “Reaching Agreement in the Presence of Faults,” J. ACM, vol. 27, no. 2, pp. 228–234, Apr. 1980.
[27] R.L. Rivest,A. Shamir, and L.A. Adleman,"A Method for Obtaining Digital Signatures and Public Key Cryptosystems," Comm. ACM, vol. 21, pp. 120-126, 1978.
[28] F.B. Schneider,"Understanding Protocols for Byzantine Clock Synchronization," Report No. 87-859, Dept. of Computer Science, Cornell Univ., Aug. 1987.
[29] N. Suri,M. Hugue, and C. Walter,"Fault Classification and Distribution Effects on the Reliability Modeling of Large Fault-Tolerant Systems," Proc. 22nd Fault-Tolerant Computing Symp., pp. 212-220, July 1992.
[30] N. Suri,M. Hugue, and C. Walter,"Synchronization Issues in Real-Time Systems," Proc. IEEE: Special Issue on Real-Time Computing, vol. 82, no. 1, pp. 41-54, Jan. 1994.
[31] P.M. Thambidurai and Y.K. Park,"Interactive Consistency with Multiple Failure Modes," Proc. seventh Reliable Dist. Systems Symp., Oct. 1988.
[32] P.M. Thambidurai,A.M. Finn,R.M. Kieckhafer, and C.J. Walter,"Clock Synchronization in MAFT," Proc. 19th Fault-Tolerant Computing Symp., pp. 142-151, June 1989.
[33] N. Vasanthavada and P.N. Marinos,"Synchronization of Fault-Tolerant Clocks in the Presence of Malicious Failures," IEEE Trans. Computers, vol. 37, no. 4, pp. 440-448, Apr. 1988.
[34] N. Vasanthavada,P.M. Thambidurai, and P.N. Marinos,"Design of Fault-Tolerant Clocks with Realistic Assumptions," Proc. 18th Fault-Tolerant Computing Symp., pp. 128-133, June 1989.
[35] J. Wakerly,Error Detecting Codes, Self-Checking Circuits and Applications.New York: NorthHoland, 1978.
[36] C. Walter,M. Hugue, and N. Suri,"Continual On Line Diagnosis of Hybrid Faults," Dependable Computing for Critical Applications 4, F. Cristian, G. Le Lann,T. Lunt, eds., Dependable Computing and Fault-Tolerant Systems (series) vol. 9, pp. 233-250, Springer-Verlag, 1995.

Index Terms:
Approximate agreement, Byzantine agreement, clock synchronization, convergent voting algorithms, fault-tolerant multiprocessors, hybrid faults.
Citation:
M.h. Azadmanesh, R.m. Kieckhafer, "New Hybrid Fault Models for Asynchronous Approximate Agreement," IEEE Transactions on Computers, vol. 45, no. 4, pp. 439-449, April 1996, doi:10.1109/12.494101
Usage of this product signifies your acceptance of the Terms of Use.