Issue No.09 - September (1997 vol.8)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/71.615441
<p><b>Abstract</b>—Distributed Shared Virtual Memory (<scp>DSVM</scp>) systems provide a shared memory abstraction on distributed memory architectures. Such systems ease parallel application programming because the shared-memory programming model is often more natural than the message-passing paradigm. However, the probability of failure of a <scp>DSVM</scp> increases with the number of sites. Thus, fault tolerance mechanisms must be implemented in order to allow processes to continue their execution in the event of a failure. This paper gives an overview of <it>recoverable</it><scp>DSVM</scp>s (<scp>RDSVM</scp>s) that provide a checkpointing mechanism to restart parallel computations in the event of a site failure.</p>
Distributed systems, distributed shared virtual memory, availability, backward error recovery, consistent states.
Christine Morin, Isabelle Puaut, "A Survey of Recoverable Distributed Shared Virtual Memory Systems", IEEE Transactions on Parallel & Distributed Systems, vol.8, no. 9, pp. 959-969, September 1997, doi:10.1109/71.615441