The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.08 - August (1994 vol.5)
pp: 874-879
ABSTRACT
<p>Presents the results of an implementation of several algorithms for checkpointing andrestarting parallel programs on shared-memory multiprocessors. The algorithms arecompared according to the metrics of overall checkpointing time, overhead imposed bythe checkpointer on the target program, and amount of time during which thecheckpointer interrupts the target program. The best algorithm measured achieves itsefficiency through a variation of copy-on-write, which allows the most time-consumingoperations of the checkpoint to be overlapped with the running of the program beingcheckpointed.</p>
INDEX TERMS
Index Termsparallel programming; fault tolerant computing; software reliability; system recovery;program diagnostics; low latency concurrent checkpointing; parallel programs; programrestarting; shared-memory multiprocessors; metrics; overall checkpointing time;overhead; interruption time; efficiency; copy-on-write; overlapping operations; faulttolerance; backward error recovery
CITATION
K. Li, J.F. Naughton, J.S. Plank, "Low-Latency, Concurrent Checkpointing for Parallel Programs", IEEE Transactions on Parallel & Distributed Systems, vol.5, no. 8, pp. 874-879, August 1994, doi:10.1109/71.298215
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool