Issue No. 11 - November (1996 vol. 22)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/32.553700
<p><b>Abstract</b>—In large systems, replication can become important means to improve data access times and availability. Existing recovery protocols, on the other hand, were proposed for small-scale distributed systems. Such protocols typically update stale, newly-recovered sites with replicated data and resolve the commit uncertainty of recovering sites. Thus, given that in large systems failures are more frequent and that data access times are costlier, such protocols can potentially introduce large overheads in large systems and must be avoided, if possible. We call these protocols <it>dependent recovery</it> protocols since they require a recovering site to consult with other sites. Independent recovery has been studied in the context of one-copy systems and has been proven unattainable. This paper offers independent recovery protocols for large-scale systems with replicated data. It shows how the protocols can be incorporated into several well-known replication protocols and proves that these protocols continue to ensure data consistency. The paper then addresses the issue of nonblocking atomic commitment. It presents mechanisms which can reduce the overhead of termination protocols and the probability of blocking. Finally, the performance impact of the proposed recovery protocols is studied through the use of simulation and analytical studies. The results of these studies show that the significant benefits of independent recovery can be enjoyed with a very small loss in data availability and a very small increase in the number of transaction abortions.</p>
Availability, blocking atomic commitment, concurrency control, crash recovery, distributed computing, independent recovery, replication, transactions.
P. Triantafillou, "Independent Recovery in Large-Scale Distributed Systems," in IEEE Transactions on Software Engineering, vol. 22, no. , pp. 812-826, 1996.