Issue No.08 - August (2006 vol.32)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TSE.2006.73
Restarts or retries are a common phenomenon in computing systems, for instance, in preventive maintenance, software rejuvenation, or when a failure is suspected. Typically, one sets a time-out to trigger the restart. We analyze and optimize time-out strategies for scenarios in which the expected required remaining time of a task is not always decreasing with the time invested in it. Examples of such tasks include the download of Web pages, randomized algorithms, distributed queries, and jobs subject to network or other failures. Assuming the independence of the completion time of successive tries, we derive computationally attractive expressions for the moments of the completion time, as well as for the probability that a task is able to meet a deadline. These expressions facilitate efficient algorithms to compute optimal restart strategies and are promising candidates for pragmatic online optimization of restart timers.
Restart, software rejuvenation, time-out, fault-tolerant systems, performance and reliability modeling, completion time, adaptive systems, self-management.
Aad P.A. van Moorsel, Katinka Wolter, "Analysis of Restart Mechanisms in Software Systems", IEEE Transactions on Software Engineering, vol.32, no. 8, pp. 547-558, August 2006, doi:10.1109/TSE.2006.73