This Article 
 Bibliographic References 
 Add to: 
Fully Distributed Three-Tier Active Software Replication
July 2006 (vol. 17 no. 7)
pp. 633-645

Abstract—Keeping strongly consistent the state of the replicas of a software service deployed across a distributed system prone to crashes and with highly unstable message transfer delays (e.g., the Internet), is a real practical challenge. The solution to this problem is subject to the FLP impossibility result, and thus there is a need for "long enough” periods of synchrony with time bounds on process speeds and message transfer delays to ensure deterministic termination of any run of agreement protocols executed by replicas. This behavior can be abstracted by a partially synchronous computational model. In this setting, before reaching a period of synchrony, the underlying network can arbitrarily delay messages and these delays can be perceived as false failures by some timeout-based failure detection mechanism leading to unexpected service unavailability. This paper proposes a fully distributed solution for active software replication based on a three-tier software architecture well-suited to such a difficult setting. The formal correctness of the solution is proved by assuming the middle-tier runs in a partially synchronous distributed system. This architecture separates the ordering of the requests coming from clients, executed by the middle-tier, from their actual execution, done by replicas, i.e., the end-tier. In this way, clients can show up in any part of the distributed system and replica placement is simplified, since only the middle-tier has to be deployed on a well-behaving part of the distributed system that frequently respects synchrony bounds. This deployment permits a rapid timeout tuning reducing thus unexpected service unavailability.

[1] O. Bakr and I. Keidar, “Evaluating the Running Time of a Communication Round over the Internet,” Proc. 21st Ann. Symp. Principles of Distributed Computing, pp. 243-252, 2002.
[2] R. Baldoni and C. Marchetti, “Software Replication in Three-Tier Architectures: Is It a Real Challenge?” Proc. Eighth IEEE Workshop Future Trends of Distributed Computing Systems, pp. 133-139, Nov. 2001.
[3] R. Baldoni and C. Marchetti, “Three-Tier Replication for FT-Corba Infrastructures,” Software: Practice and Experience, vol. 33, no. 8, pp. 767-797, 2003.
[4] R. Baldoni, C. Marchetti, and S. Tucci-Piergiovanni, “Asynchronous Active Replication in Three-Tier Distributed Systems,” Proc. Ninth IEEE Pacific Rim Symp. Dependable Computing, pp. 19-26, 2002.
[5] B. Ban, “Design and Implementation of a Reliable Group Communication Toolkit for Java,” Cornell Univ., Sept. 1998.
[6] P. Bernstein, V. Hadzilacos, and H. Goodman, Concurrency Control and Recovery in Database Systems. Reading, Mass.: Addison-Wesley, 1987.
[7] P.A. Bernstein and E. Newcomer, Principles of Transaction Processing. Morgan-Kaufmann, 1997.
[8] K. Birman and T. Joseph, “Reliable Communication in the Presence of Failures,” ACM Trans. Computer Systems, vol. 5, no. 1, pp. 47-76, Feb. 1987.
[9] T. Chandra and S. Toueg, “Unreliable Failure Detectors for Reliable Distributed Systems,” J. ACM, pp. 225-267, Mar. 1996.
[10] T.D. Chandra, V. Hadzilacos, and S. Toueg, “The Weakest Failure Detector for Solving Consensus,” J. ACM, vol. 43, no. 4, pp. 685-722, July 1996.
[11] D. Chappel, “How Microsoft Transaction Server Changes the COM Programming Model,” Microsoft System J., 1998.
[12] M. Chérèque, D. Powell, P. Reynier, J.-L. Richier, and J. Voiron, “Active Replication in Delta-4,” FTCS, pp. 28-37, 1992.
[13] G.V. Chockler, I. Keidar, and R. Vitenberg, “Group Communications Specifications: A Comprehensive Study,” ACM Computing Surveys, vol. 33, no. 4, pp. 427-469, Dec. 2001.
[14] M. Correia, L.C. Lung, N.F. Neves, and P. Verissimo, “Efficient Byzantine-Resilient Reliable Multicast on a Hybrid Failure Model,” Proc. 21st IEEE Symp. Reliable Distributed Systems, pp. 2-11, Oct. 2002.
[15] F. Cristian, H. Aghili, R. Strong, and D. Dolev, “Atomic Broadcast: From Simple Diffusion to Byzantine Agreement,” Proc. 15th Int'l Conf. Fault-Tolerant Computing, 1985.
[16] X. Défago, “Agreement-Related Problems: From Semi Passive Replication to Totally Ordered Broadcast,” PhD thesis, École Polytechnique Fédérale de Lausanne, Switzerland, PhD thesis no. 2229, 2000.
[17] Z. Dianlong and W. Zorn, “End-to-End Transactions in Three-Tier Systems,” Proc. Third Int'l Symp. Distributed Objects and Applications (DOA '01), pp. 330-339, 2001.
[18] C. Dwork, N.A. Lynch, and L. Stockmeyer, “Consensus in the Presence of Partial Synchrony,” J. ACM, vol. 35, no. 2, pp. 288-323, Apr. 1988.
[19] M. Fischer, N. Lynch, and M. Patterson, “Impossibility of Distributed Consensus with One Faulty Process,” J. ACM, vol. 32, no. 2, pp. 374-382, Apr. 1985.
[20] R. Friedman and E. Hadad, “FTS: A High-Performance CORBA Fault-Tolerance Service,” Proc. Seventh IEEE Int'l Workshop Object-Oriented Real-Time Dependable Systems (WORDS '02), pp. 61-68, 2002.
[21] R. Friedman and A Vaysburd, “Fast Replicated State Machines over Partitionable Networks,” Proc. 16th IEEE Int'l Symp. Reliable Distributued Systems (SRDS), Oct. 1997.
[22] R. Guerraoui and S. Frølund, “Implementing E-Transactions with Asynchronous Replication,” IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 2, pp. 133-146, Feb. 2001.
[23] R. Guerraoui and A. Schiper, “Software-Based Replication for Fault Tolerance,” Computer, special issue on fault tolerance, vol. 30, pp. 68-74, Apr. 1997.
[24] R. Guerraoui and A. Schiper, “The Generic Consensus Service,” IEEE Trans. Software Eng., vol. 27, no. 1, pp. 29-41, Jan. 2001.
[25] V. Hadzilacos and S. Toueg, “Fault-Tolerant Broadcast and Related Problems,” Distributed Systems, S. Mullender, ed., chapter 16, Addison Wesley, 1993.
[26] M. Herlihy and J. Wing, “Linearizability: A Correctness Condition for Concurrent Objects,” ACM Trans. Programming Languages and Systems, vol. 12, no. 3, pp. 463-492, 1990.
[27] I. Keidar, “A Highly Available Paradigm for Consistent Object Replication,” Master's thesis, Inst. Computer Science, Hebrew Univ., Jerusalem, Israel, 1994.
[28] I. Keidar and D. Dolev, “Efficient Message Ordering in Dynamic Networks,” Proc. 15th ACM Symp. Principles of Distributed Computing (PODC), pp. 68-86, May 1996.
[29] B. Kemme and G. Alonso, “A Suite of Database Replication Protocols Based on Group Communications,” Proc. 18th Int'l Conf. Distributed Computing Systems (ICDCS), May 1998.
[30] L. Lamport, “Time, Clocks and the Ordering of Events in a Distributed System,” Comm. ACM, vol. 21, no. 7, pp. 558-565, 1978.
[31] S. Landis and S. Maffeis, “Building Reliable Distributed Systems with CORBA,” Theory and Practice of Object Systems, vol. 3, no. 1, 1997.
[32] C. Marchetti, “A Three-Tier Architecture for Active Software Replication,” technical report, PhD thesis, Dipartimento di Informatica e Sistemistica, Università degli Studi di Roma “La Sapienza,” 2003.
[33] C. Marchetti, S. Tucci-Piergiovanni, and R. Baldoni, “A Three-Tier Replication Protocol for Large Scale Distributed Systems,” IEICE Trans. Information Systems, special issue on dependable computing (selection of PRDC-02 papers), vol. 86-D, no. 12, pp. 2544-2552, 2003.
[34] L. Moser, P.M. Melliar-Smith, D. Agarwal, R. Budhia, and C. Lingley-Papadopoulos, “Totem: A Fault-Tolerant Multicast Group Communication System,” Comm. ACM, vol. 39, no. 4, pp. 54-63, Apr. 1996.
[35] L.E. Moser, P.M. Melliar-Smith, and P. Narasimhan, “Consistent Object Replication in the Eternal System,” Theory and Practice of Object Systems, vol. 4, no. 3, pp. 81-92, 1998.
[36] D. Powell, G. Bonn, D. Seaton, P. Verissimo, and F. Waeselynck, “The Delta-4 Approach to Dependability in Open Distributed Computing Systems,” Proc. 18th IEEE Int'l Symp. Fault-Tolerant Computing (FTCS-18), June 1988.
[37] F.B. Schneider, “Replication Management Using the State Machine Approach,” Distributed Systems, S. Mullender, ed., ACM Press, Addison Wesley, 1993.
[38] A. Vaysburd, “Fault Tolerance in Three-Tier Applications: Focusing on the Database Tier,” Proc. Int'l Workshop Reliable Middleware Systems (WREMI '99), pp. 322-327, 1999.
[39] P. Verissimo and A. Casimiro, “The Timely Computing Base Model and Architecture,” IEEE Trans. Computers, special issue on asynchronous real-time systems, vol. 51, no. 8, pp. 916-930, Aug. 2002.
[40] P. Verissimo, A. Casimiro, and C. Fetzer, “The Timely Computing Base: Timely Actions in the Presence of Uncertain Timeliness,” Proc. First Int'l Conf. Dependable Systems and Networks, 2000.
[41] G.R. Voth, C. Kindel, and J. Fujioka, “Distributed Application Development for Three-Tier Architectures: Microsoft on Windows DNA,” IEEE Internet Computing, vol. 2, no. 2, pp. 41-45, 1998.
[42] J. Yin, J.-P. Martin, A. Venkataramani, L. Alvisi, and M. Dahlin, “Separating Agreement from Execution for Byzantine Fault Tolerant Services,” Proc. 19th ACM Symp. Operating Systems Principles, pp. 253-267, 2003.
[43] W. Zhao, L.E. Moser, and P.M. Melliar-Smith, “Unification of Replication and Transaction Processing in Three-Tier Architectures,” Proc. 22nd Int'l Conf. Distributed Computing Systems (ICDCS '02), pp. 263-270, 2002.

Index Terms:
Dependable distributed systems, software replication in wide-area networks, replication protocols, architectures for dependable services.
Carlo Marchetti, Roberto Baldoni, Sara Tucci-Piergiovanni, Antonino Virgillito, "Fully Distributed Three-Tier Active Software Replication," IEEE Transactions on Parallel and Distributed Systems, vol. 17, no. 7, pp. 633-645, July 2006, doi:10.1109/TPDS.2006.89
Usage of this product signifies your acceptance of the Terms of Use.