Issue No.10 - October (1996 vol.7)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/71.539735
<p><b>Abstract</b>—A mobile computing system consists of mobile and stationary nodes, connected to each other by a communication network. The presence of mobile nodes in the system places constraints on the permissible energy consumption and available communication bandwidth. To minimize the lost computation during recovery from node failures, periodic collection of a consistent snapshot of the system (checkpoint) is required. Locating mobile nodes contributes to the checkpointing and recovery costs. Synchronous snapshot collection algorithms, designed for static networks, either force every node in the system to take a new local snapshot, or block the underlying computation during snapshot collection. Hence, they are not suitable for mobile computing systems. If nodes take their local checkpoints independently in an uncoordinated manner, each node may have to store multiple local checkpoints in stable storage. This is not suitable for mobile nodes as they have small memory. This paper presents a synchronous snapshot collection algorithm for mobile systems that neither forces every node to take a local snapshot, nor blocks the underlying computation during snapshot collection. If a node initiates snapshot collection, local snapshots of only those nodes that have directly or transitively affected the initiator since their last snapshots need to be taken. We prove that the global snapshot collection terminates within a finite time of its invocation and the collected global snapshot is consistent. We also propose a minimal rollback/recovery algorithm in which the computation at a node is rolled back only if it depends on operations that have been undone due to the failure of node(s). Both the algorithms have low communication and storage overheads and meet the low energy consumption and low bandwidth constraints of mobile computing systems.</p>
Checkpointing, causal dependency, global snapshot, mobile computing systems, portable computers, recovery.
Ravi Prakash, "Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems", IEEE Transactions on Parallel & Distributed Systems, vol.7, no. 10, pp. 1035-1048, October 1996, doi:10.1109/71.539735