This Article 
 Bibliographic References 
 Add to: 
Automatic Reconfiguration for Large-Scale Reliable Storage Systems
March/April 2012 (vol. 9 no. 2)
pp. 145-158
Rodrigo Rodrigues, Max Planck Institute for Software Systems, Saarbrücken
Barbara Liskov, Massachusetts Institute of Technology, Cambridge
Kathryn Chen, Massachusetts Institute of Technology, Cambridge
Moses Liskov, College of William and Mary, Williamsburg
David Schultz, Massachusetts Institute of Technology, Cambridge
Byzantine-fault-tolerant replication enhances the availability and reliability of Internet services that store critical state and preserve it despite attacks or software errors. However, existing Byzantine-fault-tolerant storage systems either assume a static set of replicas, or have limitations in how they handle reconfigurations (e.g., in terms of the scalability of the solutions or the consistency levels they provide). This can be problematic in long-lived, large-scale systems where system membership is likely to change during the system lifetime. In this paper, we present a complete solution for dynamically changing system membership in a large-scale Byzantine-fault-tolerant system. We present a service that tracks system membership and periodically notifies other system nodes of membership changes. The membership service runs mostly automatically, to avoid human configuration errors; is itself Byzantine-fault-tolerant and reconfigurable; and provides applications with a sequence of consistent views of the system membership. We demonstrate the utility of this membership service by using it in a novel distributed hash table called dBQS that provides atomic semantics even across changes in replica sets. dBQS is interesting in its own right because its storage algorithms extend existing Byzantine quorum protocols to handle changes in the replica set, and because it differs from previous DHTs by providing Byzantine fault tolerance and offering strong semantics. We implemented the membership service and dBQS. Our results show that the approach works well, in practice: the membership service is able to manage a large system and the cost to change the system membership is low.

[1] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, “Dynamo: Amazon's Highly Available Key-Value Store,” Proc. 21st ACM Symp. Operating Systems Principles, pp. 205-220, 2007.
[2] J. Dean,“Designs, Lessons and Advice from Building Large Distributed Systems,” Proc. Third ACM SIGOPS Int'l Workshop Large Scale Distributed Systems and Middleware (LADIS '09), Keynote talk, 2009.
[3] Amazon S3 Availability Event, , July 2008.
[4] K. Birman and T. Joseph, “Exploiting Virtual Synchrony in Distributed Systems,” Proc. 11th ACM Symp. Operating Systems Principles, pp. 123-138, Nov. 1987.
[5] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, and H. Balakrishnan, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications,” Proc. ACM SIGCOMM, 2001.
[6] M. Reiter, “A Secure Group Membership Protocol,” IEEE Trans. Software Eng., vol. 22, no. 1, pp. 31-42, Jan. 1996.
[7] H.D. Johansen, A. Allavena, and R. van Renesse, “Fireflies: Scalable Support for Intrusion-Tolerant Network Overlays,” Proc. European Conf. Computer Systems (EuroSys '06) , pp. 3-13, 2006.
[8] J. Cowling, D.R.K. Ports, B. Liskov, R.A. Popa, and A. Gaikwad, “Census: Location-Aware Membership Management for Large-Scale Distributed Systems,” Proc. Ann. Technical Conf. (USENIX '09), June 2009.
[9] D. Oppenheimer, A. Ganapathi, and D.A. Patterson, “Why Do Internet Services Fail, and What Can Be Done About It?” Proc. Fourth USENIX Symp. Internet Technologies and Systems (USITS '03), Mar. 2003.
[10] M. Castro and B. Liskov, “Practical Byzantine Fault Tolerance,” Proc. Third Symp. Operating Systems Design and Implementation (OSDI '99), Feb. 1999.
[11] L. Zhou, F.B. Schneider, and R. van Renesse, “Coca: A Secure Distributed On-Line Certification Authority,” ACM Trans. Computer Systems, vol. 20, no. 4, pp. 329-368, Nov. 2002.
[12] M. Bellare and S. Miner, “A Forward-Secure Digital Signature Scheme,” Proc. 19th Ann. Int'l Cryptology Conf. Advances in Cryptology (CRYPTO '99), pp. 431-448, 1999.
[13] R. Canetti, S. Halevi, and J. Katz, “A Forward-Secure Public-Key Encryption Scheme,” Proc. Conf. Advances in Crptology (EUROCRYPT '03) , pp. 255-271, 2003.
[14] F. Dabek, M.F. Kaashoek, D. Karger, R. Morris, and I. Stoica, “Wide-Area Cooperative Storage with CFS,” Proc. 18th ACM Symp. Operating Systems Principles (SOSP '01), Oct. 2001.
[15] A. Herzberg, M. Jakobsson, S. Jarecki, H. Krawczyk, and M. Yung, “Proactive Public Key and Signature Systems,” Proc. Fourth ACM Conf. Computer and Comm. Security (CCCS '97), pp. 100-110, 1997.
[16] C. Cachin, K. Kursawe, A. Lysyanskaya, and R. Strobl, “Asynchronous Verifiable Secret Sharing and Proactive Cryptosystems,” Proc. Ninth ACM Conf. Computer and Comm. Security, pp. 88-97, 2002.
[17] T.M. Wong, C. Wang, and J.M. Wing, “Verifiable Secret Redistribution for Archive Systems,” Proc. First Int'l IEEE Security in Storage Workshop, pp. 94-105, 2002.
[18] D. Schultz, B. Liskov, and M. Liskov, “Brief Announcement: Mobile Proactive Secret Sharing,” Proc. 27th Ann. Symp. Principles of Distributed Computing (PODC '08), Aug. 2008.
[19] J. Douceur, “The Sybil Attack,” Proc. First Int'l Workshop Peer-to-Peer Systems (IPTPS '02), 2002.
[20] D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D. Lewin, “Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the WWW,” Proc. 29th Ann. ACM Symp. Theory of Computing (STOC '97), pp. 654-663, May 1997.
[21] R. Rodrigues, “Robust Services in Dynamic Systems,” PhD dissertation, Massachusetts Inst. of Tech nology, Feb. 2005.
[22] G. Di Crescenzo, N. Ferguson, R. Impagliazzo, and M. Jakobsson, “How to Forget a Secret,” Proc. 16th Ann. Symp. Theoretical Aspects of Computer Science (STACS '99), pp. 500-509, Mar. 1999.
[23] R.C. Merkle, “A Digital Signature Based on a Conventional Encryption Function,” Proc. Conf. Theory and Applications of Cryptographic Techniques on Advances in Cryptology (CRYPTO '87), 1987.
[24] A. Clement, M. Marchetti, E. Wong, L. Alvisi, and M. Dahlin, “Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults,” Proc. Sixth USENIX Symp. Networked Systems Design and Implementation (NSDI '09), Apr. 2009.
[25] D. Malkhi and M. Reiter, “Secure and Scalable Replication in Phalanx,” Proc. 17th Symp. Reliable Distributed Systems, Oct. 1998.
[26] R. Rodrigues and B. Liskov, “A Correctness Proof for a Byzantine-Fault-Tolerant Read/Write Atomic Memory with Dynamic Replica Membership,” MIT LCS TR/920, Sept. 2003.
[27] N. Lynch, Distributed Algorithms. Morgan Kaufmann Publishers, 1996.
[28] A. Muthitacharoen, R. Morris, T. Gil, and B. Chen, “Ivy: A Read/Write Peer-to-Peer File System,” Proc. Fifth Symp. Operating Systems Design and Implementation (OSDI '02), Dec. 2002.
[29] B. Liskov and R. Rodrigues, “Tolerating Byzantine Faulty Clients in a Quorum System,” Proc. 26th IEEE Int'l Conf. Distributed Computing Systems (ICDCS '06), 2006.
[30] T.D. Chandra, V. Hadzilacos, S. Toueg, and B. Charron-Bost, “On the Impossibility of Group Membership,” Proc. 15th Ann. ACM Symp. Principles of Distributed Computing (PODC '96), pp. 322-330, 1996.
[31] R. Rodrigues, M. Castro, and B. Liskov, “BASE: Using Abstraction to Improve Fault Tolerance,” Proc. 18th ACM Symp. Operating System Principles (SOSP '01), 2001.
[32] K. Chen, “Authentication in a Reconfigurable Byzantine Fault Tolerant System,” master's thesis, Massachusetts Inst. of Tech nology, July 2004.
[33] D. Mazières, “A Toolkit for User-Level File Systems,” Proc. Ann. Technical Conf. (USENIX '02), pp. 261-274, June 2001.
[34] M. Castro, “Practical Byzantine Fault Tolerance,” PhD dissertation, Massachusetts Inst. of Tech nology, 2001.
[35] M. Reiter, “The Rampart Toolkit for Building High-Integrity Services,” Proc. Int'l Workshop Theory and Practice in Distributed Systems, pp. 99-110, 1995.
[36] K. Kihlstrom, L. Moser, and P. Melliar-Smith, “The Secure Ring Protocols for Securing Group Communication,” Proc. Hawaii Int'l Conf. System Sciences, Jan. 1998.
[37] R. Guerraoui and A. Schiper, “The Generic Consensus Service,” IEEE Trans. Software Eng., vol. 27, no. 1, pp. 29-41, Jan. 2001.
[38] M. Castro, P. Druschell, A. Ganesh, A. Rowstron, and D. Wallach, “Security for Structured Peer-to-Peer Overlay Networks,” Proc. Fifth Symp. Operating Systems Design and Implementation (OSDI '02), Dec. 2002.
[39] N. Lynch and A.A. Shvartsman, “Rambo: A Reconfigurable Atomic Memory Service,” Proc. 16th Int'l Symp. Distributed Computing (DISC '02), 2002.
[40] J.R. Lorch, A. Adya, W.J. Bolosky, R. Chaiken, J.R. Douceur, and J. Howell, “The Smart Way to Migrate Replicated Stateful Services,” Proc. European Conf. Computer Systems (EuroSys '06), pp. 103-115, 2006.
[41] L. Alvisi, D. Malkhi, E. Pierce, M. Reiter, and R. Wright, “Dynamic Byzantine Quorum Systems,” Proc. Int'l Conf. Dependable Systems and Networks (DSN '00), pp. 283-292, June 2000.
[42] L. Kong, A. Subbiah, M. Ahamad, and D.M. Blough, “A Reconfigurable Byzantine Quorum Approach for the Agile Store,” Proc. 22nd IEEE Symp. Reliable Distributed Systems, Oct. 2003.
[43] J.-P. Martin and L. Alvisi, “A Framework for Dynamic Byzantine Storage,” Proc. Int'l Conf. Dependable Systems and Networks (DSN '04), June 2004.
[44] R. Rodrigues and B. Liskov, “Reconfigurable Byzantine-Fault-Tolerant Atomic Memory,” Proc. 23rd Ann. ACM SIGACT-SIGOPS Symp. Principles of Distributed Computing (PODC '04), p. 386, July 2004.
[45] H. Weatherspoon, P.R. Eaton, B.-G. Chun, and J. Kubiatowicz, “Antiquity: Exploiting a Secure Log for Wide-Area Distributed Storage,” Proc. European Conf. Computer Systems (EuroSys '07), pp. 371-384, 2007.
[46] J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, C. Wells, and B. Zhao, “Oceanstore: An Architecture for Global-Scale Persistent Storage,” Proc. Ninth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX), pp. 190-201, 2000.
[47] S. Rhea, P. Eaton, D. Geels, H. Weatherspoon, B. Zhao, and J. Kubiatowicz, “Pond: The Oceanstore Prototype,” Proc. Second USENIX Conf. File and Storage Technologies (FAST '03), Mar. 2003.
[48] A. Adya, W.J. Bolosky, M. Castro, G. Cermak, R. Chaiken, J.R. Douceur, J. Howell, J.R. Lorch, M. Theimer, and R.P. Wattenhofer, “Farsite: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment,” Proc. Fifth Symp. Operating Systems Design and Implementation (OSDI '02), Dec. 2002.

Index Terms:
Byzantine fault tolerance, membership service, dynamic system membership, distributed hash tables.
Rodrigo Rodrigues, Barbara Liskov, Kathryn Chen, Moses Liskov, David Schultz, "Automatic Reconfiguration for Large-Scale Reliable Storage Systems," IEEE Transactions on Dependable and Secure Computing, vol. 9, no. 2, pp. 145-158, March-April 2012, doi:10.1109/TDSC.2010.52
Usage of this product signifies your acceptance of the Terms of Use.