The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - January-March (2010 vol.7)
pp: 80-93
Yair Amir , Johns Hopkins University, Baltimore
Claudiu Danilov , Boeing Phantom Works, Seattle
Danny Dolev , Hebrew University of Jerusalem, Jerusalem
Jonathan Kirsch , Johns Hopkins University, Baltimore
John Lane , Johns Hopkins University, Baltimore
Cristina Nita-Rotaru , Purdue University, West Lafayette
Josh Olsen , University of California, Irvine, Irvine
David Zage , Purdue University, West Lafayette
ABSTRACT
This paper presents the first hierarchical Byzantine fault-tolerant replication architecture suitable to systems that span multiple wide-area sites. The architecture confines the effects of any malicious replica to its local site, reduces message complexity of wide-area communication, and allows read-only queries to be performed locally within a site for the price of additional standard hardware. We present proofs that our algorithm provides safety and liveness properties. A prototype implementation is evaluated over several network topologies and is compared with a flat Byzantine fault-tolerant approach. The experimental results show considerable improvement over flat Byzantine replication algorithms, bringing the performance of Byzantine replication closer to existing benign fault-tolerant replication techniques over wide area networks.
INDEX TERMS
Fault tolerance, scalability, wide area networks.
CITATION
Yair Amir, Claudiu Danilov, Danny Dolev, Jonathan Kirsch, John Lane, Cristina Nita-Rotaru, Josh Olsen, David Zage, "Steward: Scaling Byzantine Fault-Tolerant Replication to Wide Area Networks", IEEE Transactions on Dependable and Secure Computing, vol.7, no. 1, pp. 80-93, January-March 2010, doi:10.1109/TDSC.2008.53
REFERENCES
[1] Y. Amir, C. Danilov, J. Kirsch, J. Lane, D. Dolev, C. Nita-Rotaru, J. Olsen, and D. Zage, "Scaling Byzantine Fault-Tolerant Replication to Wide Area Networks," Proc. Int'l Conf. Dependable Systems and Networks (DSN '06), pp. 105-114, 2006.
[2] P.A. Bernstein, V. Hadzilacos, and N. Goodman, Concurrency Control and Recovery in Database Systems. Addison-Wesley Longman, 1987.
[3] M.P. Herlihy and J.M. Wing, "Linearizability: A Correctness Condition for Concurrent Objects," ACM Trans. Programming Languages and Systems, vol. 12, no. 3, pp. 463-492, 1990.
[4] M. Castro and B. Liskov, "Practical Byzantine Fault Tolerance," Proc. Third Usenix Symp. Operating Systems Design and Implementation (OSDI '99), pp. 173-186, 1999.
[5] A. Avizeinis, "The N-Version Approach to Fault-Tolerant Software," IEEE Trans. Software Eng., vol. 11, no. 12, pp. 1491-1501, Dec. 1985.
[6] Genesis: A Framework for Achieving Component Diversity, http://www.cs.virginia.edugenesis/, 2008.
[7] B. Cox, D. Evans, A. Filipi, J. Rowanhill, W. Hu, J. Davidson, J. Knight, A. Nguyen-Tuong, and J. Hiser, "N-Variant Systems: A Secretless Framework for Security through Diversity," Proc. Usenix Security Symp. (Usenix-SS '06), pp. 105-120, 2006.
[8] Y. Amir, B. Coan, J. Kirsch, and J. Lane, "Customizable Fault Tolerance for Wide-Area Replication," Proc. 26th IEEE Int'l Symp. Reliable Distributed Systems (SRDS '07), pp. 65-82, 2007.
[9] M.J. Fischer, "The Consensus Problem in Unreliable Distributed Systems (A Brief Survey)," Fundamentals of Computation Theory, pp. 127-140, 1983.
[10] D. Dolev and H.R. Strong, "Authenticated Algorithms for Byzantine Agreement," SIAM J. Computing, vol. 12, no. 4, pp. 656-666, 1983.
[11] R.D. Schlichting and F.B. Schneider, "Fail-Stop Processors: An Approach to Designing Fault-Tolerant Computing Systems," Computer Systems, vol. 1, no. 3, pp. 222-238, 1983.
[12] "The Rampart Toolkit for Building High-Integrity Services," selected papers from the Int'l Workshop Theory and Practice in Distributed Systems, pp. 99-110, 1995.
[13] K.P. Kihlstrom, L.E. Moser, and P.M. Melliar-Smith, "The SecureRing Protocols for Securing Group Communication," Proc. 31st Ann. IEEE Hawaii Int'l Conf. System Sciences (HICSS '98), vol. 3, pp. 317-326, 1998.
[14] M. Cukier, T. Courtney, J. Lyons, H.V. Ramasamy, W.H. Sanders, M. Seri, M. Atighetchi, P. Rubel, C. Jones, F. Webber, P. Pal, R. Watro, and J. Gossett, "Providing Intrusion Tolerance with ITUA," Supplement of IEEE Int'l Conf. Dependable Systems and Networks (DSN '02), pp. C-5-1-C-5-3, 2002.
[15] H.V. Ramasamy, P. Pandey, J. Lyons, M. Cukier, and W.H. Sanders, "Quantifying the Cost of Providing Intrusion Tolerance in Group Communication Systems," Proc. IEEE Int'l Conf. Dependable Systems and Networks (DSN '02), pp. 229-238, 2002.
[16] K. Eswaran, J. Gray, R. Lorie, and I. Taiger, "The Notions of Consistency and Predicate Locks in a Database System," Comm. ACM, vol. 19, no. 11, pp. 624-633, 1976.
[17] D. Skeen, "A Quorum-Based Commit Protocol," Proc. Sixth Berkeley Workshop Distributed Data Management and Computer Networks, pp. 69-80, 1982.
[18] L. Lamport, "The Part-Time Parliament," ACM Trans. Computer Systems, vol. 16, no. 2, pp. 133-169, 1998.
[19] L. Lamport, "Paxos Made Simple," ACM SIGACT News, vol. 32, pp. 51-58, 2001.
[20] J. Yin, J.-P. Martin, A. Venkataramani, L. Alvisi, and M. Dahlin, "Separating Agreement from Execution for Byzantine Fault-Tolerant Services," Proc. 19th ACM Symp. Operating Systems Principles (SOSP '03), pp. 253-267, 2003.
[21] J.-P. Martin and L. Alvisi, "Fast Byzantine Consensus," IEEE Trans. Dependable and Secure Computing, vol. 3, no. 3, pp. 202-215, July-Sept. 2006.
[22] R. Rodrigues, P. Kouznetsov, and B. Bhattacharjee, "Large-Scale Byzantine Fault Tolerance: Safe but Not Always Live," Proc. Third Workshop Hot Topics in System Dependability (HotDep '07), p. 17, 2007.
[23] R. Kotla, L. Alvisi, M. Dahlin, A. Clement, and E. Wong, "Zyzzyva: Speculative Byzantine Fault Tolerance," Proc. 21st ACM Symp. Operating Systems Principles (SOSP '07), pp. 45-58, 2007.
[24] D. Malkhi and M.K. Reiter, "Secure and Scalable Replication in Phalanx," Proc. 17th IEEE Int'l Symp. Reliable Distributed Systems (SRDS '98), pp. 51-60, 1998.
[25] D. Malkhi and M. Reiter, "Byzantine Quorum Systems," J. Distributed Computing, vol. 11, no. 4, pp. 203-213, 1998.
[26] D. Malkhi and M. Reiter, "An Architecture for Survivable Coordination in Large Distributed Systems," IEEE Trans. Knowledge and Data Eng., vol. 12, no. 2, pp. 187-202, Mar.-Apr. 2000.
[27] D. Malkhi, M. Reiter, D. Tulone, and E. Ziskind, "Persistent Objects in the Fleet System," The Second DARPA Information Survivability Conf. and Exposition (DISCEX '01), pp. 126-136, 2001.
[28] M. Abd-El-Malek, G. Ganger, G. Goodson, M. Reiter, and J. Wylie, "Fault-Scalable Byzantine Fault-Tolerant Services," Proc. 20th ACM Symp. Operating Systems Principles (SOSP '05), pp. 59-74, 2005.
[29] M. Correia, L.C. Lung, N.F. Neves, and P. Veríssimo, "Efficient Byzantine-Resilient Reliable Multicast on a Hybrid Failure Model," Proc. 21st IEEE Int'l Symp. Reliable Distributed Systems (SRDS '02), pp. 2-11, 2002.
[30] P. Veríssimo, "Uncertainty and Predictability: Can They Be Reconciled?" Future Directions in Distributed Computing, LNCS 2584, pp. 108-113, 2003.
[31] Y.G. Desmedt and Y. Frankel, "Threshold Cryptosystems," Proc. Ninth Ann. Int'l Cryptology Conf. (CRYPTO '89), pp. 307-315, 1989.
[32] A. Shamir, "How to Share a Secret," Comm. ACM, vol. 22, no. 11, pp. 612-613, 1979.
[33] V. Shoup, "Practical Threshold Signatures," LNCS 1807, pp. 207-223, 2000.
[34] R.L. Rivest, A. Shamir, and L.M. Adleman, "A Method for Obtaining Digital Signatures and Public-Key Cryptosystems," Comm. ACM, vol. 21, no. 2, pp. 120-126, 1978.
[35] P. Feldman, "A Practical Scheme for Non-Interactive Verifiable Secret Sharing," Proc. 28th Ann. Symp. Foundations of Computer Science (FOCS '87), pp. 427-437, 1987.
[36] L. Lamport, "Time, Clocks, and the Ordering of Events in a Distributed System," Comm. ACM, vol. 21, no. 7, pp. 558-565, 1978.
[37] F.B. Schneider, "Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial," ACM Computing Surveys, vol. 22, no. 4, pp. 299-319, 1990.
[38] M.J. Fischer, N.A. Lynch, and M.S. Paterson, "Impossibility of Distributed Consensus with One Faulty Process," J. ACM, vol. 32, no. 2, pp. 374-382, 1985.
[39] The Spines Project, http:/www.spines.org/, 2008.
[40] Y. Amir and C. Danilov, "Reliable Communication in Overlay Networks," Proc. IEEE Int'l Conf. Dependable Systems and Networks (DSN '03), pp. 511-520, 2003.
[41] Planetlab, http:/www.planet-lab.org/, 2008.
[42] The CAIRN Network, http://www.isi.edu/div7CAIRN/, 2008.
[43] Y. Amir, C. Danilov, M. Miskin-Amir, J. Stanton, and C. Tutu, "On the Performance of Consistent Wide-Area Database Replication," Technical Report CNDS-2003-3, Dec. 2003.
26 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool