Parallel and Distributed Processing Symposium, International (2001)
San Francisco, California, USA
Apr. 23, 2001 to Apr. 27, 2001
We consider the problem of designing rollback error recovery algorithms for dynamic, wide area distributed systems like the Internet. The characteristics and the scale of such a system complicate the design and performance of the algorithms. Traditional message passing based algorithms incur large overhead, in both the network traffic and message passing delay, in such a wide-area environment. In this paper, we propose a novel approach to designing checkpointing and rollback algorithms using mobile agents as an aid. Using mobile agent leads to a reduction of the total amount of communication and allows us to design algorithms that take the advantage of the most up to date system information for decision making. It also allows us to develop algorithms implementing flexible and adaptive policies. A mobile agent enabled hybrid algorithm combining independent and coordinated checkpointing is proposed. A prototype of the algorithms is developed using IBM?s Aglets. Results of performance evaluation are presented and discussed.
G. Chan, W. Jia, J. Cao and T. S. Dillon, "Checkpointing and Rollback of Wide-Area Distributed Applications Using Mobile Agents," Parallel and Distributed Processing Symposium, International(IPDPS), San Francisco, California, USA, 2001, pp. 10014a.