This article compares several strategies for storing checkpoint data from parallel applications in an opportunistic grid environment. In terms of computational overhead, storage overhead, and degree of fault tolerance, the authors evaluate the use of replication, parity information, and erasure coding. They use an object-oriented grid middleware solution called InteGrade to implement these strategies and to perform the evaluation experiments.
Index Terms:
fault tolerance, distributed storage, data coding, checkpointing, grid computing
Citation:
Raphael Y. de Camargo, Fabio Kon, Renato Cerqueira, "Strategies for Checkpoint Storage on Opportunistic Grids," IEEE Distributed Systems Online, vol. 7, no. 9, pp. 1, Sept. 2006, doi:10.1109/MDSO.2006.56