The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - March/April (2012 vol.9)
pp: 236-249
Fatemeh Borran , École Polytechnique Fédérale de Lausanne, Lausanne
Martin Hutle , Fraunhofer AISEC, Munich
Nuno Santos , École Polytechnique Fédérale de Lausanne, Lausanne
André Schiper , École Polytechnique Fédérale de Lausanne, Lausanne
ABSTRACT
Consensus is one of the key problems in fault-tolerant distributed computing. Although the solvability of consensus is now a well-understood problem, comparing different algorithms in terms of efficiency is still an open problem. In this paper, we address this question for round-based consensus algorithms using communication predicates, on top of a partial synchronous system that alternates between good and bad periods (synchronous and nonsynchronous periods). Communication predicates together with the detailed timing information of the underlying partially synchronous system provide a convenient and powerful framework for comparing different consensus algorithms and their implementations. This approach allows us to quantify the required length of a good period to solve a given number of consensus instances. With our results, we can observe several interesting issues, such as the number of rounds of an algorithm is not necessarily a good metric for its performance.
INDEX TERMS
Distributed systems, fault tolerance, distributed algorithms, round-based model, consensus, system modeling.
CITATION
Fatemeh Borran, Martin Hutle, Nuno Santos, André Schiper, "Quantitative Analysis of Consensus Algorithms", IEEE Transactions on Dependable and Secure Computing, vol.9, no. 2, pp. 236-249, March/April 2012, doi:10.1109/TDSC.2011.48
REFERENCES
[1] D. Alistarh, S. Gilbert, R. Guerraoui, and C. Travers, “How to Solve Consensus in the Smallest Window of Synchrony,” Proc. 22nd Int'l Conf. Distributed Computing (DISC '08), pp. 32-46, 2008.
[2] F. Borran, M. Hutle, N. Santos, and A. Schiper, “Quantitative Analysis of Consensus Algorithms,” Technical Report EPFL-REPORT-150216, EPFL, 2010.
[3] T.D. Chandra and S. Toueg, “Unreliable Failure Detectors for Reliable Distributed Systems,” J. ACM, vol. 43, no. 2, pp. 225-267, Mar. 1996.
[4] B. Charron-Bost and A. Schiper, “Improving Fast Paxos: Being Optimistic with no Overhead,” Proc. Pacific Rim Int'l Symp. Dependable Computing, 2006.
[5] B. Charron-Bost and A. Schiper, “The Heard-of Model: Computing in Distributed Systems with Benign Faults,” Distributed Computing, vol. 22, pp. 49-71, 2009.
[6] P. Dutta, R. Guerraoui, and I. Keidar, “The Overhead of Consensus Failure Recovery,” Distributed Computing, vol. 19, nos. 5/6, pp. 373-386, Apr. 2007.
[7] P. Dutta, R. Guerraoui, and L. Lamport, “How Fast Can Eventual Synchrony Lead to Consensus,” Proc. Int'l Conf. Dependable Systems and Networks (DSN '05), pp. 22-27, 2005.
[8] C. Dwork, N. Lynch, and L. Stockmeyer, “Consensus in the Presence of Partial Synchrony,” J. ACM, vol. 35, no. 2, pp. 288-323, Apr. 1988.
[9] M.J. Fischer, N.A. Lynch, and M.S. Paterson, “Impossibility of Distributed Consensus with One Faulty Process,” J. ACM, vol. 32, no. 2, pp. 374-382, Apr. 1985.
[10] E. Gafni, “Round-by-Round Fault Detectors (Extended Abstract): Unifying Synchrony and Asynchrony,” Proc. 16th Ann. ACM Symp. Principles of Distributed Computing (PODC '98), pp. 143-152, 1998.
[11] I. Keidar and A. Shraer, “Timeliness, Failure-Detectors, and Consensus Performance,” Proc. 25th Ann. ACM Symp. Principles of Distributed Computing (PODC '06), pp. 169-178, 2006.
[12] L. Lamport, “The Part-Time Parliament,” ACM Trans. Computer Systems, vol. 16, no. 2, pp. 133-169, May 1998.
[13] L. Lamport, “Fast Paxos,” Technical Report MSR-TR-2005-12, Microsoft Research, 2005.
[14] R.D. Prisco, B. Lampson, and N. Lynch, “Revisiting the Paxos Algorithm,” Proc. 11th Int'l Workshop Distributed Algorithms (WDAG '97), pp. 111-125, 1997.
[15] N. Santoro and P. Widmayer, “Time is Not a Healer,” Proc. Sixth Ann. Symp. Theoretical Aspects of Computer Science (STACS '89), pp. 304-313, Feb. 1989.
[16] A. Schiper, “Early Consensus in an Asynchronous System with a Weak Failure Detector,” Distributed Computing, vol. 10, no. 3, pp. 149-157, Apr. 1997.
14 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool