21st IEEE Symposium on Reliable Distributed Systems (SRDS'02)
Non-Intrusive, Parallel Recovery of Replicated Data
Osaka University, Suita, Japan
October 13-October 16
ISBN: 0-7695-1659-9
The increasingly widespread use of cluster architectures has resulted in many new application scenarios for data replication. While data replication is, in principle, a well understood problem, recovery of replicated systems has not yet received enough attention. In the case of clusters, recovery procedures are particularly important since they have to keep a high level of availability even during recovery. In fact, recovery is part of the normal operations of any cluster as the cluster is expected to continue working while sites leave or join the system. However, traditional recovery techniques usually require stopping processing. Once a quiescent state has been reached, the system proceeds to synchronize the state of failed or new replicas. In this paper, we concentrate on how to perform recovery in a replication middleware without having to stop processing. The proposed protocol focuses on how to minimize the redundancies that take place during concurrent recovery of several sites.
Citation:
R. Jiménez-Peris, M. Patiño-Martínez, G. Alonso, "Non-Intrusive, Parallel Recovery of Replicated Data," srds, pp.150, 21st IEEE Symposium on Reliable Distributed Systems (SRDS'02), 2002