This Article 
 Bibliographic References 
 Add to: 
Message-Optimal Protocols for Fault-Tolerant Broadcasts/Multicasts in Distributed Systems with Crash Failures
February 1995 (vol. 44 no. 2)
pp. 346-352

Abstract—An essential feature in any fault-tolerant design of distributed systems is a mechanism by which a process can reliably broadcast information to other processes in the presence of failures. This paper studies the message complexity of fault-tolerant broadcast protocols in weakly synchronous and totally asynchronous distributed systems with point-to-point communication links, where the system failures are caused by the processes but the communication links are completely reliable. We focus on the number of messages required of any fault-tolerant protocol in failure-free executions. Our motivation is that one should incur the cost of handling failures only when they actually occur. We present protocols that, in an n-process system subject to at most t crash failures where 1 \leq t\,{\char'074}\,(n - 1), guarantee the delivery of a message from any process to other nonfaulty processes. In the absence of crash failures, our protocols require (n + t - 1) messages in the weakly synchronous model and (t + 1)(n - 1 - (t/2)) messages in the totally asynchronous model. Moreover, we show that in both cases our protocols are optimal with respect to message complexity. The new insights provided in our lower bound proofs also yield graph-theoretic characterizations of all message-optimal reliable broadcast protocols in failure-free executions. Both the upper and lower bound results on broadcast protocols can be generalized to multicast protocols, where a process only needs to deliver a message to a subset of processes in the system.

Index Terms— Reliable broadcasts/multicasts, distributed computing, network protocols, fault tolerance, message complexity.

[1] E. Amdur, S. Weber, and V. Hadzilacos,“On the message complexity of binary byzantine agreement under crash failures,”Distributed Computing, vol. 5, no. 2, pp. 175–186, 1992.
[2] T. D. Chandra and S. Toueg,“Time and message efficient reliable broadcasts,”inProc. 4th Int. Workshop Distributed Algorithms, J. van Jeeuwen and N. Santoro, Eds., Bari, Italy, Springer Verlag, 1990.
[3] M.J. Fischer, N.A. Lynch, and M.S. Paterson, “Impossibility of Distributed Consensus with One Faulty Process,” J. ACM, vol. 32, no. 2, pp. 374i–382, 1985.
[4] V. Hadzilacos and S. Toueg, "Fault-Tolerant Broadcasts and Related Problems," in Distributed Systems, S. Mullender, ed., ACM Press, New York, 1993, pp. 97-138.
[5] L. Lamport, "Time, clocks and the ordering of events in a distributed system," Comm. ACM, vol. 21, no. 7, pp. 558-565, July 1978.
[6] M. Pease, R. Shostak, and L. Lamport, “Reaching Agreement in the Presence of Faults,” J. ACM, vol. 27, no. 2, pp. 228–234, Apr. 1980.

Hong-Yi Tzeng, Kai-Yeung Siu, "Message-Optimal Protocols for Fault-Tolerant Broadcasts/Multicasts in Distributed Systems with Crash Failures," IEEE Transactions on Computers, vol. 44, no. 2, pp. 346-352, Feb. 1995, doi:10.1109/12.364545
Usage of this product signifies your acceptance of the Terms of Use.