This Article 
 Bibliographic References 
 Add to: 
Broadcast Protocols for Distributed Systems
January 1990 (vol. 1 no. 1)
pp. 17-25

An innovative approach is presented to the design of fault-tolerant distributed systems that avoids the several rounds of message exchange required by current protocols for consensus agreement. The approach is based on broadcast communication over a local area network, such as an Ethernet or a token ring, and on two novel protocols, the Transprotocol, which provides efficient reliable broadcast communication, and the Totalprotocol, which with high probability promptly places a total order on messages and achieves distributed agreement even in the presence of fail-stop, omission, timing, and communication faults. Reliable distributed operations, such as locking, update, and commitment, typically require only a single broadcast message rather than the several tens of messages required by current algorithms.

[1] P. A. Bernstein and N. Goodman, "The failure and recovery problem for replicated databases," inProc. 2nd Ann. Symp. Principles of Distributed Computing, 1983, pp. 114-122.
[2] K.P. Birman and T.A. Joseph, "Reliable Communication in the Presence of Failures,"ACM Trans. Computer Systems, Vol. 5, No. 1, Feb. 1987, pp. 47-76.
[3] K. P. Birman and T. A. Joseph, "Exploiting virtual synchrony in distributed systems," inProc. ACM Symp. Operat. Syst. Prin., 1987, pp. 123-138.
[4] A. D. Birrell and B. J. Nelson, "Implementing remote procedure calls,"ACM Trans. Comp. Syst., vol. 2, no. 1, pp. 39-59, Feb. 1984.
[5] G. Bracha, "Asynchronous Byzantine agreement protocols,"Inform. Computat., vol. 75, pp. 130-143, Nov. 1987.
[6] G. Bracha and S. Toueg, "Asynchronous Consensus and broadcast protocols,"J. Ass. Comput. Mach., vol. 32, no. 4, pp. 824-840, Oct. 1985.
[7] J. Chang, "Simplifying distributed data base systems design by using a broadcast network," inProc. ACM SIGMOD '84, vol. 14, no. 2. 1984, pp. 223-233.
[8] J. Chang and N. F. Maxemchuk, "Reliable broadcast protocols,"ACM Trans. Comput. Syst., vol. 2, no. 3, pp. 251-273, Aug. 1984.
[9] D. R. Cheriton and W. Zwaenepoel, "Distributed process groups in the V kernel,"ACM Trans. Comput. Syst., vol. 3, no. 2, pp. 77-107, May 1985.
[10] D. R. Cheriton, "VMTP: A transport protocol for the next generation of communications systems," inProc. SIGCOMM '86 Symp., Aug. 1987, pp. 406-415.
[11] F. Cristian, H. Aghili, and R. Strong, "Atomic broadcast: From simple message diffusion to Byzantine agreement." inProc. IEEE Symp. Fault Tolerant Computing Syst., 1985, pp. 200-206.
[12] Data Communications Networks, Services and Facilities, Red Book VIII.2, Geneva: CCITT, 1984.
[13] D. Dolev, C. Dwork, and L. Stockmeyer, "On the minimal synchronism needed for distributed consensus,"J. ACM, vol. 34, no. 1, pp. 77-97, Jan. 1987.
[14] M. J. Fischer, N. A. Lynch, and M. S. Paterson, "Impossibility of distributed consensus with one faulty process,"J. ACM, vol. 32, no. 2, pp. 374-382, Apr. 1985.
[15] H. Garcia-Molina and A. Spauster, "Message ordering in a multicast environment," inProc. 9th Int. Conf. on Distr. Comput. Syst.(Newport Beach, CA), June 1989, pp. 354-361.
[16] N. L. Johnson and S. Kotz,Urn Models and Their Application. New York: Wiley, 1977.
[17] H. Kopetzet al., "Distributed fault-tolerant real-time systems: The Mars approach,"IEEE Micro, vol. 9, no. 1, pp. 25-40, Feb. 1989.
[18] H. Kopetz, G. Grüsteidl, and J. Reisinger, "Fault-tolerant membership service in a synchronous distributed real-time system," inProc. IFIP Int. Working Conf. Dependable Computing for Crit. Appl., 1989, pp. 167-174.
[19] S. W. Luan and V. D. Gligor, "A fault-tolerant protocol for atomic broadcast," inProc. IEEE 7th Symp. Reliable Distrib. Syst., 1988, pp. 112-126.
[20] P. M. Melliar-Smith and L. E. Moser, "Trans: A broadcast protocol for distributed systems," to be published.
[21] L. E. Moser, P. M. Melliar-Smith, and V. Agrawala, "On the impossibility of broadcast agreement protocols," to be published.
[22] L. E. Moser, P. M. Melliar-Smith, "Asymptotic broadcast agreement protocols," to be published.
[23] K. J. Perry and S. Toueg, "Distributed agreement in the presence of processor and communication faults,"IEEE Trans. Software Eng., vol. SE-12, no. 3, pp. 477-482, Mar. 1986.
[24] L. L. Peterson, N. Buchholz, and R. D. Schlichting, "Preserving and using context information in interprocess communication,"ACM Trans. Comput. Syst., vol. 7, no. 3, pp. 217-246, Aug. 1989.

Index Terms:
total message order; reliable distributed operations; fault-tolerant distributed systems;message exchange; consensus agreement; broadcast communication; local area network;Ethernet; token ring; Trans protocol; Total protocol; total order; distributed agreement;locking; update; commitment; concurrency control; distributed processing; fault tolerantcomputing; local area networks; protocols
P.M. Melliar-Smith, L.E. Moser, V. Agrawala, "Broadcast Protocols for Distributed Systems," IEEE Transactions on Parallel and Distributed Systems, vol. 1, no. 1, pp. 17-25, Jan. 1990, doi:10.1109/71.80121
Usage of this product signifies your acceptance of the Terms of Use.