Reliable Distributed Systems, IEEE Symposium on (2002)
Osaka University, Suita, Japan
Oct. 13, 2002 to Oct. 16, 2002
R. Jiménez-Peris , Technical University of Madrid
M. Patiño-Martínez , Technical University of Madrid
G. Alonso , Swiss Federal Institute of Technology
The increasingly widespread use of cluster architectures has resulted in many new application scenarios for data replication. While data replication is, in principle, a well understood problem, recovery of replicated systems has not yet received enough attention. In the case of clusters, recovery procedures are particularly important since they have to keep a high level of availability even during recovery. In fact, recovery is part of the normal operations of any cluster as the cluster is expected to continue working while sites leave or join the system. However, traditional recovery techniques usually require stopping processing. Once a quiescent state has been reached, the system proceeds to synchronize the state of failed or new replicas. In this paper, we concentrate on how to perform recovery in a replication middleware without having to stop processing. The proposed protocol focuses on how to minimize the redundancies that take place during concurrent recovery of several sites.
R. Jiménez-Peris, M. Patiño-Martínez and G. Alonso, "Non-Intrusive, Parallel Recovery of Replicated Data," Reliable Distributed Systems, IEEE Symposium on(SRDS), Osaka University, Suita, Japan, 2002, pp. 150.