|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
21st IEEE Symposium on Reliable Distributed Systems (SRDS'02)
On Node State Reconstruction for Fault Tolerant Distributed Algorithms
Osaka University, Suita, Japan
October 13-October 16
ISBN: 0-7695-1659-9
| ASCII Text | x | ||
| Michael Okun, Amnon Barak, "On Node State Reconstruction for Fault Tolerant Distributed Algorithms," Reliable Distributed Systems, IEEE Symposium on, pp. 160, 21st IEEE Symposium on Reliable Distributed Systems (SRDS'02), 2002. | |||
| BibTex | x | ||
| @article{ 10.1109/RELDIS.2002.1180184, author = {Michael Okun and Amnon Barak}, title = {On Node State Reconstruction for Fault Tolerant Distributed Algorithms}, journal ={Reliable Distributed Systems, IEEE Symposium on}, volume = {0}, year = {2002}, issn = {1060-9857}, pages = {160}, doi = {http://doi.ieeecomputersociety.org/10.1109/RELDIS.2002.1180184}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Reliable Distributed Systems, IEEE Symposium on TI - On Node State Reconstruction for Fault Tolerant Distributed Algorithms SN - 1060-9857 SP EP A1 - Michael Okun, A1 - Amnon Barak, PY - 2002 KW - Distributed algorithms KW - fault tolerance KW - state reconstruction KW - recovery VL - 0 JA - Reliable Distributed Systems, IEEE Symposium on ER - | |||
One of the main methods for achieving fault tolerance in distributed systems is recovery of the state of failed components. Though generic recovery methods like checkpointing and message logging exist, in many cases the recovery has to be application specific. In this paper we propose a general model for a node state reconstruction after crash failures. In our model the reconstruction operation is defined only by the requirements it fulfills, without referring to the specific application dependent way it is performed. The model provides a framework for formal treatment of algorithm-specific and system-specific recovery procedures. It is used to specify node state reconstruction procedures for several widely used distributed algorithms and systems, as well as to prove their correctness.
Index Terms:
Distributed algorithms, fault tolerance, state reconstruction, recovery
Citation:
Michael Okun, Amnon Barak, "On Node State Reconstruction for Fault Tolerant Distributed Algorithms," srds, pp.160, 21st IEEE Symposium on Reliable Distributed Systems (SRDS'02), 2002
Usage of this product signifies your acceptance of the Terms of Use.
