The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.10 - October (1994 vol.5)
pp: 1033-1043
ABSTRACT
<p>Several variations of cache-based checkpointing for rollback error recovery from transient errors in shared-memory multiprocessors have been recently developed. By modifying the cache replacement policy, these techniques use the inherent redundancy in the memory hierarchy to periodically checkpoint the computation state. Three schemes, different in the manner in which they avoid rollback propagation, are evaluated in this paper. By simulation with address traces from parallel applications running on an Encore Multimax shared-memory multiprocessor, we evaluate the performance effect of integrating the recovery schemes in the cache coherence protocol. Our results indicate that the cache-based schemes can provide checkpointing capability with low performance overhead, but with uncontrollable high variability in the checkpoint interval.</p>
INDEX TERMS
Index Termsbuffer storage; shared memory systems; virtual machines; redundancy; system recovery;performance evaluation; cache-based error recovery performance; multiprocessors;cache-based checkpointing; rollback error recovery; transient errors; shared-memorymultiprocessors; cache replacement policy; inherent redundancy; memory hierarchy;computation state; rollback propagation; address traces; parallel applications; EncoreMultimax; performance evaluation; recovery schemes; cache coherence protocol;cache-based schemes; low performance overhead; checkpoint interval
CITATION
B. Janssens, W.K. Fuchs, "The Performance of Cache-Based Error Recovery in Multiprocessors", IEEE Transactions on Parallel & Distributed Systems, vol.5, no. 10, pp. 1033-1043, October 1994, doi:10.1109/71.313120
27 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool