This Article 
 Bibliographic References 
 Add to: 
A Fault-Tolerant Protocol for Atomic Broadcast
July 1990 (vol. 1 no. 3)
pp. 271-285

A general protocol for atomic broadcast in networks is presented. The protocol tolerates loss, duplication, reordering, delay of messages, and network partitioning in an arbitrary network of fail-stop sites (i.e. no Byzantine site behavior is tolerated). The protocol is based on majority-concensus decisions to commit on unique ordering of received broadcast messages. Under normal operating conditions, the protocol requires three phases to complete and approximately 4N/V messages where N is the number of sites. This overhead is distributed among the messages of which the delivery decision is made and the heavier the broadcast message traffic, the lower the overhead per broadcast message becomes. Under abnormal operating conditions, a decentralized termination protocol (also presented) is invoked. A performance analysis of this protocol is presented, showing that this protocol commits with high probability under realistic operating conditions without invoking termination protocol if N is sufficiently large. The protocol retains its efficiency in wide-area networks where broadcast communication media are unavailable.

[1] A. Demerset al., "Epidemic algorithms for replicated database management," inProc. 6th ACM Symp. Principles Distributed Comput., 1987, pp. 1-12.
[2] H. Garcia-Molina and B. Kogan, "An implementation of reliable broadcast using an unreliable multicast facility," inProc. 7th IEEE Symp. Reliable Distributed Syst., Columbus, OH, Oct. 1988, pp. 101-111.
[3] K.P. Birman and T.A. Joseph, "Reliable Communication in the Presence of Failures,"ACM Trans. Computer Systems, Vol. 5, No. 1, Feb. 1987, pp. 47-76.
[4] J. Chang and N. F. Maxemchuk, "Reliable broadcast protocols,"ACM Trans. Comput. Syst., vol. 2, no. 3, pp. 251-273, Aug. 1984.
[5] F. Cristian, H. Aghili, and R. Strong, "Atomic broadcast: From simple message diffusion to Byzantine agreement," inProc. FTCS-15, Ann Arbor, MI, June 1985, pp. 200-206.
[6] P. M. Melliar-Smith, L. E. Moser, and V. Agrawala, "Broadcast protocols for distributed systems,"IEEE Trans. Parallel Distributed Syst., vol. 1, no. 1, pp. 17-25, Jan. 1990.
[7] L. L. Peterson, N. Buchholz, and R. D. Schlichting, "Preserving and using context information in interprocess communication,"ACM Trans. Comput. Syst., vol. 7, no. 3, pp. 217-246, Aug. 1989.
[8] H. Garcia-Molina and A. Spauster, "Message ordering in a multicast environment," inProc. 9th Int. Conf. on Distr. Comput. Syst.(Newport Beach, CA), June 1989, pp. 354-361.
[9] R. Thomas, "A majority consensus approach to concurrency control,"ACM Trans. Database Syst., vol. 4, pp. 180-209, June 1979.
[10] P. A. Bernstein and N. Goodman, "Multiversion concurrency control,"ACM Trans. Database Syst., vol. 8, no. 4, pp. 465-483, Dec. 1983.
[11] A. El Abbadi, D. Skeen, and F. Cristian, "An efficient, fault-tolerant protocol for replicated data management, inProc. 4th ACM SIGACT-SIGMOD Symp. Principles Database Syst., Portland, OR, Mar. 1985, pp. 215-228.
[12] A. El Abbadi and S. Toueg, "Maintaining availability in partitioned replicated databases,"ACM Trans. Database Syst., vol. 14, no. 2, pp. 264-290, June 1989.
[13] I. L. Traiger, J. Gray, C. A. Galtieri, and B. G. Lindsay, "Transactions and consistency in distributed database systems,"ACM Trans. Database Syst., vol. 7, pp. 323-342, Sept. 1982.
[14] F. Cristian, "Probabilistic clock synchronization,"Distributed Comput., pp. 146-158, 1989.
[15] D. Skeen, "A formal model of crash recovery in a distributed system,"IEEE Trans. Software Eng., vol. SE-9, no. 3, pp. 219-228, May 1983.
[16] M. J. Fischer, N. A. Lynch, and M. S. Paterson, "Impossibility of distributed consensus with one faulty process,"J. ACM, vol. 32, no. 2, pp. 374-382, Apr. 1985.
[17] D. Skeen, "A quorum-based commit protocol,"Tech. Rep., Comput. Sci. TR 82-483, Cornell Univ., 1982.

Index Terms:
Index Termsfault-tolerant protocol; atomic broadcast; loss; duplication; reordering; delay of messages; network partitioning; arbitrary network; fail-stop sites; Byzantine site behavior; majority-concensus decisions; decentralized termination protocol; performanceanalysis; fault tolerant computing; performance evaluation; protocols
S.W. Luan, V.D. Gligor, "A Fault-Tolerant Protocol for Atomic Broadcast," IEEE Transactions on Parallel and Distributed Systems, vol. 1, no. 3, pp. 271-285, July 1990, doi:10.1109/71.80156
Usage of this product signifies your acceptance of the Terms of Use.