Publication 1997 Issue No. 2 - February Abstract - Maximizing Mean-Time to Failure in k-Resilient Systems with Repair
Maximizing Mean-Time to Failure in k-Resilient Systems with Repair
February 1997 (vol. 46 no. 2)
pp. 229-234
 ASCII Text x José Fridman, Sampath Rangarajan, "Maximizing Mean-Time to Failure in k-Resilient Systems with Repair," IEEE Transactions on Computers, vol. 46, no. 2, pp. 229-234, February, 1997.
 BibTex x @article{ 10.1109/12.565606,author = {José Fridman and Sampath Rangarajan},title = {Maximizing Mean-Time to Failure in k-Resilient Systems with Repair},journal ={IEEE Transactions on Computers},volume = {46},number = {2},issn = {0018-9340},year = {1997},pages = {229-234},doi = {http://doi.ieeecomputersociety.org/10.1109/12.565606},publisher = {IEEE Computer Society},address = {Los Alamitos, CA, USA},}
 RefWorks Procite/RefMan/Endnote x TY - JOURJO - IEEE Transactions on ComputersTI - Maximizing Mean-Time to Failure in k-Resilient Systems with RepairIS - 2SN - 0018-9340SP229EP234EPD - 229-234A1 - José Fridman, A1 - Sampath Rangarajan, PY - 1997KW - Mean time to failureKW - k-resilient systemsKW - Weibull distributionKW - Markov chains.VL - 46JA - IEEE Transactions on ComputersER -

Abstract—A k-resilient system with N components can tolerate up to k component failures and still function correctly. We consider k-resilient systems where the number of component failures is a constant fraction of the total number of components, that is $k={\textstyle{N \over c}},$ and c is a constant such that 2 ≤c < ∞. Under a Markovian assumption of constant failure and repair rates, we compute the system size Nmax at which the mean-time to failure (MTTF) for such a system is maximized. Our results indicate that Nmax can be expressed in terms of constant c and parameter ρ as $N_{max}={\textstyle{{K(c,\rho )} \over \rho }},$ where $\rho ={\textstyle{\lambda \over \mu }}$ and K(c, ρ) is a function of c, ρ. In addition, we have found that the variation of Nmax over the whole range of c is remarkably small, and as a result, even if the resilience k of a system as a function of N varies widely, the system size at which the MTTF is maximized is within the range

$${{0.36} \over {\rho }}\ {\schmi {\bf and}}\ {{0.5} \over {\rho }}.$$We validate our results through event-driven simulation, and, in addition, examine the behavior of systems with Weibull distributed failure times.

[1] K.S. Trivedi, Probability and Statistics with Reliability, Queuing, and Computer Science Applications. Prentice Hall, 1982.
[2] Y.C. Tay, "The Reliability of (k, n)-Resilient Distributed Systems," Proc. Fourth Symp. Reliability in Distributed Software and Database Systems, pp. 119-122, Oct. 1984.
[3] R.H. Thomas, “A Majority Consensus Approach to Concurrency Control,” ACM Trans. Database Systems, vol. 4, no. 2, pp. 180-209, June 1979.
[4] L. Lamport, R. Shostak, and M. Pease, "The Byzantine Generals Problem," ACM Trans. Programming Languages and Systems, vol. 4, no. 3, July 1982, pp. 382-401.
[5] D. Siewiorek and R. Swarz, Reliable Computer Systems: Design and Evaluation. Digital Press, 1992.
[6] H. Schwetman, CSIM Reference Manual (Revision 16).Austin, Tex.: Microelectronics and Computer Technology Corporation.

Index Terms:
Mean time to failure, k-resilient systems, Weibull distribution, Markov chains.
Citation:
José Fridman, Sampath Rangarajan, "Maximizing Mean-Time to Failure in k-Resilient Systems with Repair," IEEE Transactions on Computers, vol. 46, no. 2, pp. 229-234, Feb. 1997, doi:10.1109/12.565606