19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 16
Proactive Fault Handling for System Availability Enhancement
Denver, Colorado
April 04-April 08
ISBN: 0-7695-2312-9
Proactive fault handling combines prevention and repair actions with failure prediction techniques. We extend the standard availability formula by five key measures: (1) precision and (2) recall assess failure prediction while failure handling is gauged by (3) prevention probability, (4) repair time improvement, and (5) risk of introducing additional failures. We give a short survey of actions that are suited to be combined with failure prediction and provide a procedure to estimate the five key measures. Altogether, this allows to quantify the impact of proactive fault handling on system availability and may provide valuable input for system design.
Citation:
Felix Salfner, Miroslaw Malek, "Proactive Fault Handling for System Availability Enhancement," ipdps, vol. 17, pp.281a, 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 16, 2005