Atomic Broadcast in Asynchronous Crash-Recovery Distributed Systems and Its Use in Quorum-Based Replication
Issue No. 05 - September/October (2003 vol. 15)
Lu? Rodrigues , IEEE
<p><b>Abstract</b>—<it>Atomic Broadcast</it> is a fundamental problem of distributed systems: It states that messages must be delivered in the same order to their destination processes. This paper describes a solution to this problem in asynchronous distributed systems in which processes can crash and recover. A Consensus-based solution to <it>Atomic Broadcast</it> problem has been designed by Chandra and Toueg for asynchronous distributed systems where crashed processes do not recover. We extend this approach: It transforms any Consensus protocol suited to the crash-recovery model into an Atomic Broadcast protocol suited to the same model. We show that Atomic Broadcast can be implemented requiring few additional log operations in excess of those required by the Consensus. The paper also discusses how additional log operations can improve the protocol in terms of faster recovery and better throughput. To illustrate the use of the protocol, the paper also describes a solution to the replica management problem in asynchronous distributed systems in which processes can crash and recover. The proposed technique makes a bridge between established results on Weighted Voting and recent results on the Consensus problem.</p>
Distributed fault-tolerance, asynchronous systems, atomic broadcast, consensus, crash/recovery, quorum, replica management, weighted voting.
M. Raynal and L. Rodrigues, "Atomic Broadcast in Asynchronous Crash-Recovery Distributed Systems and Its Use in Quorum-Based Replication," in IEEE Transactions on Knowledge & Data Engineering, vol. 15, no. , pp. 1206-1217, 2003.