Advances in Parallel and Distributed Computing Conference (1997)
Mar. 19, 1997 to Mar. 21, 1997
Roberto Baldoni , IRISA, Campus de Beaulieu
Jean-Michel Helary , IRISA, Campus de Beaulieu
Achour Mostefaoui , IRISA, Campus de Beaulieu
Michel Raynal , IRISA, Campus de Beaulieu
In many systems, backward recovery constitutes a classical technique to ensure fault-tolerance. It consists in restoring a computation in a consistent global state, saved in a global checkpoint, from which this computation can be resumed. A global checkpoint includes a set of local checkpoints, one from each process which correspond to local states dumped onto stable storage. In this paper, we are interested in defining formally the domino effect for shared memory systems be the shared memory a physical one (as in multiprocessor systems) or a virtual one (as in distributed shared memory systems) and in designing a domino-free adaptive algorithm. These results lie on a necessary and sufficient condition which shows when a set of local checkpoints can belong to some consistent global checkpoint.
Consistent global state, domino effect, fault-tolerance, memory consistency, shared memory.
A. Mostefaoui, M. Raynal, J. Helary and R. Baldoni, "Consistent State Restoration in Shared Memory Systems," Advances in Parallel and Distributed Computing Conference(APDC), Shanghai, CHINA, 1997, pp. 330.