Issue No. 09 - September (1997 vol. 8)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/71.615441
<p><b>Abstract</b>—Distributed Shared Virtual Memory (<scp>DSVM</scp>) systems provide a shared memory abstraction on distributed memory architectures. Such systems ease parallel application programming because the shared-memory programming model is often more natural than the message-passing paradigm. However, the probability of failure of a <scp>DSVM</scp> increases with the number of sites. Thus, fault tolerance mechanisms must be implemented in order to allow processes to continue their execution in the event of a failure. This paper gives an overview of <it>recoverable</it><scp>DSVM</scp>s (<scp>RDSVM</scp>s) that provide a checkpointing mechanism to restart parallel computations in the event of a site failure.</p>
Distributed systems, distributed shared virtual memory, availability, backward error recovery, consistent states.
C. Morin and I. Puaut, "A Survey of Recoverable Distributed Shared Virtual Memory Systems," in IEEE Transactions on Parallel & Distributed Systems, vol. 8, no. , pp. 959-969, 1997.