The Community for Technology Leaders
Fault-Tolerant Computing, International Symposium on (1999)
Madison, Wisconsin
June 15, 1999 to June 18, 1999
ISSN: 0731-3071
ISBN: 0-7695-0213-X
pp: 242
Lorenzo Alvisi , University of Texas at Austin
Sriram Rao , University of Texas at Austin
Syed Amir Husain , University of Texas at Austin
Asanka de Mel , University of Texas at Austin
ABSTRACT
Communication induced checkpointing (CIC) allows processes in a distributed computation to take independent checkpoints and to avoid the domino effect. This paper presents an analysis of CIC protocols based on a prototype implementation and validated simulations. Our result inidcate that there is sufficient evidence to suspect that much of the conventional wisdom about these protocols is questionable.
INDEX TERMS
Checkpointing, Rollback Recovery, Performance Evaluation, MPI, Consistent Global States
CITATION

L. Alvisi, A. de Mel, S. A. Husain, E. Elnozahy and S. Rao, "An Analysis of Communication-Induced Checkpointing," Fault-Tolerant Computing, International Symposium on(FTCS), Madison, Wisconsin, 1999, pp. 242.
doi:10.1109/FTCS.1999.781058
90 ms
(Ver 3.3 (11022016))