Sixth International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT'05)
A Communication-Induced Checkpointing and Asynchronous Recovery Protocol for Mobile Computing Systems
Dalian, China
December 05-December 08
ISBN: 0-7695-2405-2
Mobile computing systems have many constraints such as low battery power, low bandwidth , high mobility and lack of stable storage which are not presented in static distributed systems. In this paper, we propose an efficient communication-induced checkpointing protocol for mobile computing systems. We also propose an asynchronous recovery protocol based on the checkpointing protocol. Mobile support stations control major parts of the checkpointing and recovery such as storing and tracing the checkpoints, requesting rollback and logging messages, so that mobile hosts do not incur much overhead. The recovery algorithm has no domino effect and a failed process needs to roll back to its latest checkpoint and request only a subset of the processes to rollback to a consistent checkpoint. Our recovery protocol uses selective message logging at the mobile support station to handle the messages lost due to rollback.
Index Terms:
Distributed checkpointing, mobile computing system, failure recovery, fault-tolerance.
Citation:
Tongchit Tantikul, D. Manivannan, "A Communication-Induced Checkpointing and Asynchronous Recovery Protocol for Mobile Computing Systems," pdcat, pp.70-74, Sixth International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT'05), 2005