loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
20th IEEE International Conference on Distributed Computing Systems (ICDCS'00)
On Low-Cost Error Containment and Recovery Methods for Guarded Software Upgrading
Taipei, Taiwan
April 10-April 13
ISBN: 0-7695-0601-1
Ann T. Tai, IA Tech, Inc.
Kam S. Tso, IA Tech, Inc.
Leon Alkalai, California Institute of Technology
Savio N. Chau, California Institute of Technology
William H. Sanders, University of Illinois at Urbana-Champaign
To assure dependable onboard evolution, we have developed a methodology called guarded software upgrading (GSU). In this paper, we focus on a low-cost approach to error containment and recovery for GSU. To ensure low development cost, we exploit inherent system resource redundancies as the fault tolerance means. In order to mitigate the effect of residual software faults at low performance cost, we take a crucial step in devising error containment and recovery methods by introducing the “confidence-driven” notion. This notion complements the message-driven (or “communication-induced”) approach employed by a number of existing checkpointing protocols for tolerating hardware faults. In particular, we discriminate between the individual software components with respect to our confidence in their reliability, and keep track of changes of our confidence (due to knowledge about potential process state contamination) in particular processes. This, in turn, enables the individual processes in the spaceborne distributed system to make decisions locally, at run-time, on whether to establish a checkpoint upon message passing and whether to roll back or roll forward during error recovery. The resulting message-driven confidence-driven approach enables cost-effective checkpointing and cascading-rollback free recovery.
Index Terms:
Long-life applications, guarded software upgrading, error containment and recovery, inherent resource redundancies, checkpointing
Citation:
Ann T. Tai, Kam S. Tso, Leon Alkalai, Savio N. Chau, William H. Sanders, "On Low-Cost Error Containment and Recovery Methods for Guarded Software Upgrading," icdcs, pp.548, 20th IEEE International Conference on Distributed Computing Systems (ICDCS'00), 2000
Usage of this product signifies your acceptance of the Terms of Use.