loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
1999 International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN '99)
Fault-Tolerance Using Cache-Coherent Distributed Shared Memory Systems
Fremantle, Australia
June 23-June 25
ISBN: 0-7695-0231-8
D.L. Hecht, The University of Alabama in Huntsville
K.M. Kavi, The University of Alabama in Huntsville
R.K. Gaede, The University of Alabama in Huntsville
C. Katsinis, Drexel University
In this paper, we describe new protocols augmenting traditional cache coherency mechanisms to implement fault-tolerance based on Recovery Blocks and checkpointing. Concurrent processes compound rollback recovery since the rollback can potentially lead to a "domino effect" whereby the process is rolled back to the beginning. Several approaches have been proposed to limit the domino effect. One set of such techniques requires communicating processes to periodically synchronize in order to checkpoint a globally consistent state. These schemes can be implemented more naturally on distributed shared memory systems using synchronization on shared memory. We have developed extensions to well known cache-coherency methods (e.g., directory-based) for the implementation of checkpointing consistent states.
Index Terms:
Checkpointing, Backward recovery, Cache-Coherency, Conversations, Recovery Blocks, Directory-Based Protocols, Distributed Shared Memory
Citation:
D.L. Hecht, K.M. Kavi, R.K. Gaede, C. Katsinis, "Fault-Tolerance Using Cache-Coherent Distributed Shared Memory Systems," ispan, pp.100, 1999 International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN '99), 1999
Usage of this product signifies your acceptance of the Terms of Use.