The Community for Technology Leaders
14th International Conference on Distributed Computing Systems (1994)
Pozman, Poland
June 21, 1994 to June 24, 1994
ISBN: 0-8186-5840-1
pp: 235-242
Cheng-Ru Young , Dept. of Electr. Eng. & Technol., Nat. Taiwan Inst. of Technol., Taipei, Taiwan
Ge-Ming Chiu , Dept. of Electr. Eng. & Technol., Nat. Taiwan Inst. of Technol., Taipei, Taiwan
ABSTRACT
In this paper we propose a new mechanism for implementing checkpoint/rollback-recovery in a distributed computing system. A logical-ring structure is introduced for the maintenance of recovery-related information. Message processing order of a process is maintained by all other processes on its associated ring. It requires no time-consuming operations of writing order information into stable storage. As a result, fail-free overhead is small. When failures occur, only failed processes have to roll back to their latest checkpoints. Surviving processes continue execution without being blocked. Output commit is fast as it needs no synchronization before a message is sent to the outside world.<>
INDEX TERMS
message passing, system recovery, distributed processing, fault tolerant computing, software reliability
CITATION

Cheng-Ru Young and Ge-Ming Chiu, "A crash recovery technique in distributed computing systems," 14th International Conference on Distributed Computing Systems(ICDCS), Pozman, Poland, 1994, pp. 235-242.
doi:10.1109/ICDCS.1994.302417
93 ms
(Ver 3.3 (11022016))