This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Versatile Family of Consensus Protocols Based on Chandra-Toueg's Unreliable Failure Detectors
April 2002 (vol. 51 no. 4)
pp. 395-408

This paper is on consensus protocols for asynchronous distributed systems prone to process crashes, but equipped with Chandra-Toueg's unreliable failure detectors. It presents a unifying approach based on two orthogonal versatility dimensions. The first concerns the class of the underlying failure detector. An instantiation can consider any failure detector of the class (provided that at least one process does not crash), or (provided that a majority of processes do not crash). The second versatility dimension concerns the message exchange pattern used during each round of the protocol. This pattern (and, consequently, the round message cost) can be defined for each round separately, varying from (centralized pattern) to (fully distributed pattern), n being the number of processes. The resulting versatile protocol has nice features and actually gives rise to a large and well-identified family of failure detector-based consensus protocols. Interestingly, this family includes at once new protocols and some well-known protocols (e.g., Chandra-Toueg's protocol). The approach is also interesting from a methodological point of view. It provides a precise characterization of the two sets of processes that, during a round, have to receive messages for a decision to be taken (liveness) and for a single value to be decided (safety), respectively. Interestingly, the versatility of the protocol is not restricted to failure detectors: a simple timer-based instance provides a consensus protocol suited to partially synchronous systems.

[1] M.K. Aguilera, W. Chen, and S. Toueg, “Failure Detection and Consensus in the Crash-Recovery Model,” Distributed Computing, vol. 13, no. 2, pp. 99-125, 2000.
[2] M.K. Aguilera and S. Toueg, “Failure Detection and Randomization: a Hybrid Approach to Solve Consensus,” SIAM J. Computing, vol. 28, no. 3, pp. 890-903, 1998.
[3] J. Brzezinski, J.M. Helary, M. Raynal, and M. Singhal, “Deadlock Models and Generalized Algorithm for Distributed Deadlock Detection,” J. Parallel and Distributed Computing, vol. 31, no. 2, pp. 112-125, Dec. 1995.
[4] T.D. Chandra and S. Toueg, “Unreliable Failure Detectors for Reliable Distributed Systems,” J. ACM, vol. 43, no. 2, pp. 225–267, 1996.
[5] T.D. Chandra, V. Hadzilacos, and S. Toueg, “The Weakest Failure Detector for Solving Consensus,” J. ACM, vol. 43, no. 4, pp. 685–722, July 1996.
[6] C. Dwork, N. Lynch, and L. Stockmeyer, “Consensus in the Presence of Partial Synchrony,” J. ACM. vol. 35, no. 2, pp. 288–323, Apr. 1988.
[7] M.J. Fischer, N.A. Lynch, and M.S. Paterson, “Impossibility of Distributed Consensus with One Faulty Process,” J. ACM, vol. 32, no. 2, pp. 374i–382, 1985.
[8] F. Greve, M. Hurfin, R. Macêdo, and M. Raynal, “Time and Message Efficient S-Based Consensus,” Brief announcement, Proc. 19th ACM SIGACT-SIGOPS Int'l Symp. Principles of Distributed Computing (PODC '00), p. 332, July 2000.
[9] V. Hadzilacos and S. Toueg, "Fault-Tolerant Broadcasts and Related Problems," in Distributed Systems, S. Mullender, ed., ACM Press, New York, 1993, pp. 97-138.
[10] J. Helary, A. Mostefaoui, and M. Raynal, “A General Scheme for Token- and Tree-Based Distributed Mutual Exclusion Algorithms,” IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 11, pp. 1185-1196, Nov. 1994.
[11] M. Hurfin, A. Mostefaoui, and M. Raynal, “Consensus in Asynchronous Systems Where Processes can Crash and Recover,” Proc. 17th IEEE Symp. Reliable Distributed Systems, pp. 280-286, 1998.
[12] M. Hurfin and M. Raynal, “A Simple and Fast Asynchronous Consensus Protocol Based on a Weak Failure Detector,” Distributed Computing, vol. 12, no. 4, pp. 209-223, 1999.
[13] A. Kshemkalyani and M. Singhal, “On Characterization and Correctness of Distributed Deadlock Detection,” J. Parallel and Distributed Computing, vol. 22, no. 1, pp. 44-59, July 1994.
[14] N. Lynch, Distributed Algorithms. New Jersey, Morgan Kaufman, 1996.
[15] A. Mostéfaoui, S. Rajsbaum, and M. Raynal, “Conditions on Input Vectors for Consensus Solvability in Asynchronous Distributed Systems,” Proc. 33rd ACM Symp. Theory of Computing (STOC '01), pp. 153-162, July 2001.
[16] A. Mostéfaoui, S. Rajsbaum, M. Raynal, and M. Roy, “A Hierarchy of Conditions for Consensus Solvability,” Proc. 20th ACM Symp. Principles of Distributed Computing (PODC '01), pp. 151-160, Aug. 2001.
[17] A. Mostefaoui and M. Raynal, “Solving Consensus Using Chandra-Toueg's Unreliable Failure Detectors: A Generic Quorum-Based Approach,” Proc. 13th Int'l Symp. Distributed Computing (DISC '99), pp. 49–63, 1999.
[18] A. Mostefaoui and M. Raynal, “Consensus Based on Failure Detectors with a Perpetual Weak Accuracy Property,” Proc. IEEE Int'l Parallel and Distributed Processing Symp., pp. 514-519, May 2000.
[19] A. Mostéfaoui, M. Raynal, and F. Tronel, “The Best of Both Worlds: a Hybrid Approach to Solve Consensus,” Proc. Int'l Conf. Dependable Systems and Networks (DSN '00, previously FTCS), pp. 513-522, June 2000.
[20] R. Oliveira, R. Guerraoui, and S. Schiper, “Consensus in the Crash-Recovery Model,” Research Report 97-239, Dpartement d'informatique, EPFL, Lausanne, Aug. 1997.
[21] L. Rodrigues and P. Verìssimo, “Topology-Aware Algorithms for Large Scale Communication,” Advances in Distributed Systems, pp. 1217-1256, 2000.
[22] B. Sanders, “The Information Structure of Distributed Mutual Exclusion Algorithms,” ACM Trans. Computer Systems, vol. 5, no. 3, pp. 284-299, Aug. 1987.
[23] A. Schiper, “Early Consensus in an Asynchronous System with a Weak Failure Detector,” Distributed Computing, vol. 10, no. 3, pp. 149-157, Mar./Apr. 1997.
[24] D. Skeen, "Non-Blocking Commit Protocols," Proc. ACM SIGMOD Int'l Conf. Management of Data, ACM Press, New York, 1981, pp. 133-142.
[25] J. Yang, G. Neiger, and E. Gafni, “Structured Derivations of Consensus Algorithms for Failure Detectors,” Proc. 17th ACM Symp. Principles of Distributed Computing. pp. 297-308, 1998.

Index Terms:
asynchronous distributed system, consensus problem, consensus protocols, crash failure, fault-tolerance, quorum, unreliable failure detector
Citation:
M. Hurfin, A. Mostéfaoui, M. Raynal, "A Versatile Family of Consensus Protocols Based on Chandra-Toueg's Unreliable Failure Detectors," IEEE Transactions on Computers, vol. 51, no. 4, pp. 395-408, April 2002, doi:10.1109/12.995450
Usage of this product signifies your acceptance of the Terms of Use.