Maximizing Mean-Time to Failure in k-Resilient Systems with Repair
February 1997 (vol. 46 no. 2)
pp. 229-234

Abstract—A k-resilient system with N components can tolerate up to k component failures and still function correctly. We consider k-resilient systems where the number of component failures is a constant fraction of the total number of components, that is $k={\textstyle{N \over c}},$ and c is a constant such that 2 ≤c < ∞. Under a Markovian assumption of constant failure and repair rates, we compute the system size Nmax at which the mean-time to failure (MTTF) for such a system is maximized. Our results indicate that Nmax can be expressed in terms of constant c and parameter ρ as $N_{max}={\textstyle{{K(c,\rho )} \over \rho }},$ where $\rho ={\textstyle{\lambda \over \mu }}$ and K(c, ρ) is a function of c, ρ. In addition, we have found that the variation of Nmax over the whole range of c is remarkably small, and as a result, even if the resilience k of a system as a function of N varies widely, the system size at which the MTTF is maximized is within the range

$${{0.36} \over {\rho }}\ {\schmi {\bf and}}\ {{0.5} \over {\rho }}.$$We validate our results through event-driven simulation, and, in addition, examine the behavior of systems with Weibull distributed failure times.

Index Terms:
Mean time to failure, k-resilient systems, Weibull distribution, Markov chains.
José Fridman, Sampath Rangarajan, "Maximizing Mean-Time to Failure in k-Resilient Systems with Repair," IEEE Transactions on Computers, vol. 46, no. 2, pp. 229-234, Feb. 1997, doi:10.1109/12.565606
