Issue No.02 - February (1997 vol.46)

pp: 229-234

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/12.565606

ABSTRACT

<p><b>Abstract</b>—A <it>k</it>-resilient system with <it>N</it> components can tolerate up to <it>k</it> component failures and still function correctly. We consider <it>k</it>-resilient systems where the number of component failures is a constant fraction of the total number of components, that is <tmath>$k={\textstyle{N \over c}},$</tmath> and <it>c</it> is a constant such that 2 ≤<it>c</it> < ∞. Under a Markovian assumption of constant failure and repair rates, we compute the system size <it>N</it><sub><it>max</it></sub> at which the <it>mean-time to failure</it> (MTTF) for such a system is maximized. Our results indicate that <it>N</it><sub><it>max</it></sub> can be expressed in terms of constant <it>c</it> and parameter ρ as <tmath>$N_{max}={\textstyle{{K(c,\rho )} \over \rho }},$</tmath> where <tmath>$\rho ={\textstyle{\lambda \over \mu }}$</tmath> and <it>K</it>(<it>c</it>, ρ) is a function of <it>c</it>, ρ. In addition, we have found that the variation of <it>N</it><sub><it>max</it></sub> over the whole range of <it>c</it> is remarkably small, and as a result, even if the resilience <it>k</it> of a system as a function of <it>N</it> varies widely, the system size at which the MTTF is maximized is within the range</p><tf>$${{0.36} \over {\rho }}\ {\schmi {\bf and}}\ {{0.5} \over {\rho }}.$$</tf><ip1>We validate our results through event-driven simulation, and, in addition, examine the behavior of systems with <it>Weibull distributed</it> failure times.</ip1>

INDEX TERMS

Mean time to failure, k-resilient systems, Weibull distribution, Markov chains.

CITATION

José Fridman, Sampath Rangarajan, "Maximizing Mean-Time to Failure in k-Resilient Systems with Repair",

*IEEE Transactions on Computers*, vol.46, no. 2, pp. 229-234, February 1997, doi:10.1109/12.565606