Issue No. 06 - June (1993 vol. 19)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/32.232022
<p>An approach to checkpointing and rollback recovery in a distributed computing system using a common time base is proposed. A common time base is established in the system using a hardware clock synchronization algorithm. This common time base is coupled with the idea of pseudo-recovery points to develop a checkpointing algorithm that has the following advantages: reduced wait for commitment for establishing recovery lines, fewer messages to be exchanged, and less memory requirement. These advantages are assessed quantitatively by developing a probabilistic model.</p>
message exchange; distributed system; checkpointing; rollback recovery; common time base; hardware clock synchronization algorithm; pseudo-recovery points; recovery lines; memory requirement; probabilistic model; distributed processing; fault tolerant computing; system recovery
P. Ramanathan and K. Shin, "Use of Common Time Base for Checkpointing and Rollback Recovery in a Distributed System," in IEEE Transactions on Software Engineering, vol. 19, no. , pp. 571-583, 1993.