This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Reaching Approximate Agreement with Mixed-Mode Faults
January 1994 (vol. 5 no. 1)
pp. 53-63

In a fault-tolerant distributed system, different non-faulty processes may arrive atdifferent values for a given system parameter. To resolve this disagreement, processesmust exchange and vote upon their respective local values. Faulty processes mayattempt to inhibit agreement by acting in a malicious or "Byzantine" manner. Approximate agreement defines one form of agreement in which the voted values obtained by the non-faulty processes need not be identical. Instead, they need only agree to within a predefined tolerance. Approximate agreement can be achieved by a sequence of convergent voting rounds, in which the range of values held by non-faulty processes isreduced in each round. Historically, each new convergent voting algorithm has beenaccompanied by ad-hoc proofs of its convergence rate and fault-tolerance, using anoverly conservative fault model in which all faults exhibit worst-case Byzantine behavior.This paper presents a general method to quickly determine convergence rate andfault-tolerance for any member of a broad family of convergent voting algorithms. Thismethod is developed under a realistic mixed-mode fault model comprised of asymmetric,symmetric, and benign fault modes. These results are employed to more accuratelyanalyze the properties of several existing voting algorithms, to derive a sub-family ofoptimal mixed-mode voting algorithms, and to quickly determine the properties ofproposed new voting algorithms.

[1] K. W. Anderson and D. W. Hall,Sets, Sequences, and Mappings: the Basic Concepts of Analysis. New York: Wiley, 1963.
[2] O. Babaoglu and R. Drummond, "Streets of Byzantium: Network architectures for fast reliable broadcasts,"IEEE Trans. Software Eng., vol. SE-11, no. 6, pp. 546-554, Jun. 1985.
[3] L. Chen and A. Avizienis, "N-version programming: A fault tolerant approach to reliability of software operation," inProc. Int. Symp. on Fault Tolerant Computing, 1978, pp. 3-9.
[4] D. Davies and J. Wakerly, "Synchronization and matching in redundant systems,"IEEE Trans. Comput., vol. C-27, no. 6, pp. 531-539, June 1978.
[5] D. Dolev, N. A. Lynch, S. S. Pinter, E. W. Stark, and W. E. Weihl, "Reaching approximate agreement in the presence of faults," presented at theProc. Third Symp. on Reliability in Distributed Software and Database Syst., Oct 1983.
[6] D. Dolev, N. A. Lynch, S. S. Pinter, E. W. Stark, and W. E. Weihl, "Reaching approximate agreement in the presence of faults,"J. ACM, vol. 33, pp. 499-516, July 1986.
[7] K. R. Driscoll, G. M. Papadopoulis, S. Nelson, G. L. Hartman, and G. Ramohalli, "Multi-microprocessor flight control system II," AFWAL--TR-84-3076, Honeywell Syst. and Res. Center, Minneapolis, MN, Sept. 1984.
[8] B.W. Johnson,Design and Analysis of Fault Tolerant Digital Systems, Addison-Wesley, Reading, Mass., 1989.
[9] J. L. W. Kessels, "Two designs of a fault-tolerant clocking system",IEEE Trans. Comput., vol. C-33, no. 10, pp. 912-919, Oct. 1984.
[10] R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai, "The MAFT architecture for distributed fault-tolerance,"IEEE Trans. Comput., vol. 37, no. 4, pp. 398-405, Apr. 1988.
[11] R. M. Kieckhafer and M. H. Azadmanesh, "Fault-tolerant convergent voting in large sparsely connected networks", inProc. Complex Syst. Eng. Synthesis and Assessment Technol. Workshop, July 1992, pp. 31-51.
[12] R. M. Kieckhafer and M. H. Azadmanesh, "Low cost approximate agreement in partially connected networks," to appear inJ. Computing and Inform..
[13] L. Lamport, R. Shostak, and M. Pease, "The Byzantine Generals Problem,"ACM Trans. Programming Languages and Systems, Vol. 4, No. 3, July 1982, pp. 382-401.
[14] L. Lamport and P.M. Melliar-Smith, "Synchronizing Clocks in the Presence of Faults,"J. ACM, Vol. 32, No. 1, Jan. 1985, pp. 52-78.
[15] C. L. Liu,Elements of Discrete Mathematics, 2nd ed. New York: McGraw-Hill, 1985.
[16] J. Lundelius and N. Lynch, "A new fault-tolerant algorithm for clock synchronization," inProc. Principles Distributed. Comput., June 1984, pp. 75-88.
[17] F. J. Meyer and D. K. Pradhan, "Consensus with dual failure modes,"Proc. Seventeenth Fault-tolerant Computing Symp., July 1987, pp. 48-54.
[18] M. Pease, R. Shostak, and L. Lamport, "Reaching agreement in the presence of faults,"J. Ass. Comput. Mach., vol. 27, pp. 228-234, Apr. 1980.
[19] D. A. Reynolds and G. Metze, "Fault detection capabilities of alternating logic,"IEEE Trans. Comput., vol. C-27, no. 12, pp. 1093-1098, Dec. 1978.
[20] F. B. Schneider, "Understanding protocols for byzantine clock synchronization," Cornell Univ. Tech. Rep. 87-859, Aug. 1987.
[21] O. Serlin, "Fault-tolerant systems in commercial applications",IEEE Comput., vol. 17, no. 8, pp. 19-30, Aug. 1984.
[22] J. R. Sklaroff, "Redundancy management technique for space shuttle computers,"IBM J. Res. Develop., vol. 20, no. 1, pp. 20-28, Jan. 1976.
[23] P. Thambidurai and Y.-K. Park, "Interactive consistency with multiple failure modes," inProc. Symp. on Reliable Dist. Syst., Oct. 1988, pp. 93-100.
[24] P. M. Thambidurai, A. M. Finn, R. M. Kieckhafer, and C. J. Walter, "Clock synchronization in MAFT," inProc. Nineteenth Fault-Tolerant Computing Symp., June 1989, pp. 142-151.
[25] N. Vasanthavada and P. N. Marinos, "Synchronization of fault-tolerant clocks in the presence of malicious failures,"IEEE Trans. Comput., vol. 37, no. 4, pp. 440-448, Apr. 1988.
[26] N. Vasanthavada, P. M. Thambidurai, and P. N. Marinos, "Design of fault-tolerant clocks with realistic assumptions," inProc. Eighteenth Fault-Tolerant Computing Symp., June 1989, pp. 128-133.

Index Terms:
Index Termsfault tolerant computing; reliability; distributed processing; synchronisation; mixed-modefaults; approximate agreement; fault-tolerant distributed system; convergent votingalgorithm; worst-case Byzantine behavior; voting algorithms
Citation:
R.M. Kieckhafer, M.H. Azadmanesh, "Reaching Approximate Agreement with Mixed-Mode Faults," IEEE Transactions on Parallel and Distributed Systems, vol. 5, no. 1, pp. 53-63, Jan. 1994, doi:10.1109/71.262588
Usage of this product signifies your acceptance of the Terms of Use.