This Article 
 Bibliographic References 
 Add to: 
Exploiting Omissive Faults in Synchronous Approximate Agreement
October 2000 (vol. 49 no. 10)
pp. 1031-1042

Abstract—In a fault-tolerant distributed system, it is often necessary for nonfaulty processes to agree on the value of a shared data item. The criterion of Approximate Agreement does not require processes to achieve exact agreement on a value; rather, they need only agree to within a predefined numerical tolerance. Approximate Agreement can be achieved through convergent voting algorithms. Previous research has studied convergent voting algorithms under mixed-mode or hybrid fault models, such as the Thambidurai and Park Hybrid fault model, comprised of three fault modes: asymmetric, symmetric, and benign. This paper makes three major contributions to the state of the art in fault-tolerant convergent voting. 1) We partition both the asymmetric and symmetric fault modes into disjoint omissive and transmissive submodes. The resulting five-mode hybrid fault model is a superset of previous hybrid fault models. 2) We present a new family of voting algorithms, called Omission Mean Subsequence Reduced (OMSR), which implicitly recognize and exploit omissive behavior in malicious faults while still maintaining full Byzantine fault tolerance. 3) We show that OMSR voting algorithms are more fault-tolerant than previous voting algorithms if any of the currently active faults is omissive.

[1] M.H. Azadmanesh and R.M. Kieckhafer, “Synchronous Approximate Agreement in the Presence of Transmissive and Omissive Failure Modes,” Technical Report no. TR-98-01, Dept. of Computer Science, Univ. of Nebraska at Omaha, 1998.
[2] G.J. Holzmann, Early Fault Detection Tools Proc. Sixth Int'l Conf. Tools and Algorithms for the Construction and Analysis of Systems (TACAS '96), 1996.
[3] M.H. Azadmanesh and R.M. Kieckhafer, “New Hybrid Fault Models for Asynchronous Approximate Agreement,” IEEE Trans. Computers, vol. 45, no. 4, pp. 439-449, Apr. 1996.
[4] M.A. Barborak, M. Malek, and A.T. Dahbura, "The Consensus Problem in Fault-Tolerant Computing," ACM Computer Surveys, vol. 25, pp. 171-220, June 1993.
[5] K. Birman, "Replication and Fault-Tolerance in the ISIS System," Proc. 10th ACM Symp. Operating Systems Principles, pp. 79-86, Dec. 1985.
[6] F. Cristian, H. Aghili, R. Strong, and D. Dolev, “Atomic Broadcast: From Simple Message Diffusion to Byzantine Agreement,” Proc. 15th Int'l Symp. Fault-Tolerant Computing, pp. 200-206, June 1985.
[7] F. Cristian, H. Aghili, and R. Strong, “Clock Synchronization in the Presence of Omission and Performance Faults, and Processor Joins,” Proc. 16th Int'l Conf. Fault-Tolerant Computing, 1986.
[8] F. Cristian, D. Dolev, R. Strong, and H. Aghili, “Atomic Broadcast in a Real-Time Environment,” IBM Research Report, Almaden Research Center, 1989.
[9] D. Dolev, N.A. Lynch, S.S. Pinter, E.W. Stark, and W.E. Weihl, “Reaching Approximate Agreement in the Presence of Faults,” Proc. Third Symp. Reliability in Distributed Software and Database Systems, pp. 145-154, Oct. 1983.
[10] D. Dolev et al., "Reaching Approximate Agreement in the Presence of Faults," J. ACM, July 1986, pp. 499-516.
[11] A.D. Fekete, “Asymptotically Optimal Algorithms for Approximate Agreement,” Distributed Computing, vol. 4, pp. 9-29, 1990.
[12] J.L.W. Kessels, “Two Designs of a Fault-Tolerant Clocking System,” IEEE Trans. Computers, vol. 33, no. 10, pp. 912-919, Oct. 1984.
[13] R.M. Kieckhafer,C.J. Walter,A.M. Finn, and P.M. Thambidurai,"The MAFT Architecture for Distributed Fault-Tolerance," IEEE Trans. Computers, vol. 37, no. 4, pp. 398-405, Apr. 1988.
[14] R.M. Kieckhafer and M.H. Azadmanesh,"Reaching Approximate Agreement with Mixed Mode Faults," IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 1, pp. 53-63, Jan. 1994.
[15] R.M. Kieckhafer and M.H. Azadmanesh, “Unified Approach to Synchronous and Asynchronous Approximate Agreement in the Presence of Hybrid Faults,” IEEE Trans. Reliability, vol. 44, no. 4, Dec. 1995.
[16] C. M. Krishna, K. G. Shin, and R. W. Butler,“Ensuring fault tolerance of phase-locked clocks,”IEEE Trans. Comput., vol. C-34, pp. 752–756, Aug. 1985.
[17] L. Lamport, R. Shostak, and M. Pease, "The Byzantine Generals Problem," ACM Trans. Programming Languages and Systems, vol. 4, no. 3, July 1982, pp. 382-401.
[18] L. Lamport and P.M. Melliar-Smith, “Synchronizing Clocks in the Presence of Faults,” J. ACM, vol. 32, no. 1, pp. 52–78, Jan. 1985.
[19] P. Lincoln and J. Rushby, “Formal Verification of an Agorithm for Interactive Consistency Under a Hybrid Fault Model,” Proc. Computer-Aided Verification, CAV '93, pp. 292–304, C. Courcoubetis, ed., Elounda, Greece, Lecture Notes in Computer Science 697, Springer-Verlag, June/July 1993.
[20] J. Lundelius and N. Lynch,"A New Fault-Tolerant Algorithm for Clock Synchronization," Third Symp. Principles of Distributed Computing, pp. 75-87, Aug. 1984.
[21] F.J. Meyer and D.K. Pradhan, “Consensus with Dual Failure Modes,” Proc. 17th Fault-Tolerant Computing Symp., pp. 48-54, July 1987.
[22] M. Pease, R. Shostak, and L. Lamport, “Reaching Agreement in the Presence of Faults,” J. ACM, vol. 27, no. 2, pp. 228–234, Apr. 1980.
[23] R. Plunkett and A. Fekete, “Approximate Agreement with Mixed Mode Faults,” Proc. 12th Int'l Symp. Distributed Computing, pp. 333-346, Sept. 1998.
[24] J. Rushby, “A Formally Verified Algorithm for Clock Synchronization under a Hybrid Fault Model,” Proc. 13th ACM Symp. Principles of Distributed Computing, pp. 304–313, Los Angeles, Calif., Aug. 1994. Also available as NASA Contractor Report 198289
[25] F.B. Schneider,"Understanding Protocols for Byzantine Clock Synchronization," Report No. 87-859, Dept. of Computer Science, Cornell Univ., Aug. 1987.
[26] N. Suri,M. Hugue, and C. Walter,"Fault Classification and Distribution Effects on the Reliability Modeling of Large Fault-Tolerant Systems," Proc. 22nd Fault-Tolerant Computing Symp., pp. 212-220, July 1992.
[27] P.M. Thambidurai and Y.K. Park,"Interactive Consistency with Multiple Failure Modes," Proc. seventh Reliable Dist. Systems Symp., Oct. 1988.
[28] P.M. Thambidurai,A.M. Finn,R.M. Kieckhafer, and C.J. Walter,"Clock Synchronization in MAFT," Proc. 19th Fault-Tolerant Computing Symp., pp. 142-151, June 1989.
[29] C. Walter, M. Hugue, and N. Suri, “Continual On-Line Diagnosis of Hybrid Faults,” Dependable Computing for Critical Applications 4, F. Cristian, G. Lann, T. Lunt, eds., pp. 233-250, 1995.

Index Terms:
Approximate agreement, clock synchronization, convergent voting algorithms, fault-tolerant distributed systems, hybrid faults.
M.h. Azadmanesh, R.m. Kieckhafer, "Exploiting Omissive Faults in Synchronous Approximate Agreement," IEEE Transactions on Computers, vol. 49, no. 10, pp. 1031-1042, Oct. 2000, doi:10.1109/12.888039
Usage of this product signifies your acceptance of the Terms of Use.