loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Eighth Workshop on Hot Topics in Operating Systems
Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel
Elmau, Germany
May 20-May 22
ISBN: 0-7695-1040-X
George Candea, Stanford University
Armando Fox, Stanford University
Abstract: Even after decades of software engineering research, complex computer systems still fail, primarily due to nondeterministic bugs that are typically resolved by rebooting. Conceding that Heisenbugs will remain a fact of life, we propose a systematic investigation of restarts as "high availability medicine." In this paper we show how recursive restartability (RR) - the ability of a system to gracefully tolerate restarts at multiple levels - improves fault tolerance, reduces time-to-repair, and enables system designers to build flexible, highly available software infrastructures. Using several examples of widely deployed software systems, we identify properties that are required of RR systems and outline an agenda for turning the recursive restartability philosophy into a practical software structuring tool. Finally, we describe infrastructural support for RR systems, along with initial ideas on how to analyze and benchmark such systems.
Citation:
George Candea, Armando Fox, "Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel," hotos, pp.0125, Eighth Workshop on Hot Topics in Operating Systems, 2001
Usage of this product signifies your acceptance of the Terms of Use.