Issue No. 07 - July (2006 vol. 17)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2006.89
<p><b>Abstract</b>—Keeping strongly consistent the state of the replicas of a software service deployed across a distributed system prone to crashes and with highly unstable message transfer delays (e.g., the Internet), is a real practical challenge. The solution to this problem is subject to the FLP impossibility result, and thus there is a need for "long enough” periods of synchrony with time bounds on process speeds and message transfer delays to ensure deterministic termination of any run of agreement protocols executed by replicas. This behavior can be abstracted by a partially synchronous computational model. In this setting, before reaching a period of synchrony, the underlying network can arbitrarily delay messages and these delays can be perceived as false failures by some timeout-based failure detection mechanism leading to unexpected service unavailability. This paper proposes a fully distributed solution for active software replication based on a three-tier software architecture well-suited to such a difficult setting. The formal correctness of the solution is proved by assuming the middle-tier runs in a partially synchronous distributed system. This architecture separates the ordering of the requests coming from clients, executed by the middle-tier, from their actual execution, done by replicas, i.e., the end-tier. In this way, clients can show up in any part of the distributed system and replica placement is simplified, since only the middle-tier has to be deployed on a well-behaving part of the distributed system that frequently respects synchrony bounds. This deployment permits a rapid timeout tuning reducing thus unexpected service unavailability.</p>
Dependable distributed systems, software replication in wide-area networks, replication protocols, architectures for dependable services.
Carlo Marchetti, Roberto Baldoni, Sara Tucci-Piergiovanni, Antonino Virgillito, "Fully Distributed Three-Tier Active Software Replication", IEEE Transactions on Parallel & Distributed Systems, vol. 17, no. , pp. 633-645, July 2006, doi:10.1109/TPDS.2006.89