This Article 
 Bibliographic References 
 Add to: 
Estimators for Fault Tolerance Coverage Evaluation
February 1995 (vol. 44 no. 2)
pp. 261-274

Abstract—This paper addresses the problem of estimating the coverage of a fault tolerance mechanism through statistical processing of observations collected in fault injection experiments. A formal definition of coverage is given in terms of the fault and system activity sets that characterize the input space. Two categories of sampling techniques are considered for coverage estimation: sampling in the whole space and sampling in a space partitioned into classes. The estimators for each technique are compared by means of hypothetical examples. Techniques for early estimations of coverage are then studied. These techniques allow unbiased estimations of coverage to be made before all classes of the sampling space have been tested. Then, the “no-reply” problem that hampers most practical fault-injection experiments is discussed and an a posteriori stratification technique is proposed that allows the scope of incomplete tests to be widened by accounting for available structural information about the target system.

Index Terms—Coverage, fault injection, fault tolerance, estimation, sampling, variance reduction.

[1] W. G. Bouricius, W. C. Carter, and P. R. Schneider,“Reliability modeling techniques for self-repairing computer systems,”inProc. 24th Nat. Conf., ACM, 1969, pp. 295–309.
[2] T. F. Arnold,“The concept of coverage and its effect on the reliability model of repairable systems,”IEEE Trans. Comput., vol. C-22, pp. 251–254, Mar. 1973.
[3] J. Arlat, M. Aguera, L. Amat, Y. Crouzet, J.-C. Fabre, J.-C. Laprie, E. Martins, and D. Powell,“Fault injection for dependability validation—A methodology and some applications,”IEEE Trans. Software Eng., vol. 16, pp. 166–182, Feb. 1990.
[4] D. Siewiorek and R. Swarz, Reliable Computer Systems: Design and Evaluation. Digital Press, 1992.
[5] A. Avizienis and D. Rennels,“Fault-tolerance experiments with the JPL-STAR computer,”inProc. 6th Annu. IEEE Computer Society Conf., San Francisco, CA, 1972, pp. 321–324.
[6] J. Arlat, A. Costes, Y. Crouzet, J.-C. Laprie, and D. Powell,“Fault injection and dependability evaluation of fault-tolerant systems,”IEEE Trans. Comput., vol. 42, pp. 913–923, Aug. 1993.
[7] J. B. Dugan and K. S. Trivedi,“Coverage modeling for dependability analysis of fault-tolerant systems,”IEEE Trans. Comput., vol. 38, pp. 775–787, June 1989.
[8] B. Bjurman, G. M. Jenkins, C. J. Masreliez, and J. E. Templeman,“Airborne advanced reconfigurable computer system,”Report no. CR-145024, NASA, 1976.
[9] D. P. Siewiorek, J. J. Hudak, B.-H. Suh, and Z. Segall,“Development of a benchmark to measure system robustness,”inProc. 23rd Int. Conf. Fault-Tolerant Computing (FTCS-23), Toulouse, France, IEEE Computer Society Press, 1993, pp. 88–97 .
[10] J. Arlat,“Dependability validation by fault injection: Method, implementation, application,”State Doctoral Dissertation, INPT, Toulouse, France, 1990 (in French).
[11] J. Arlat, M. Aguera, Y. Crouzet, J. Fabre, E. Martins, and D. Powell,“Experimental evaluation of the fault tolerance of an atomic multicast protocol,”IEEE Trans. Reliability, vol. 39, pp. 455–467, Oct. 1990.
[12] B. Grais,Statistical Methods. Paris, France: Dunod, 1991 (in French).
[13] J. Desbadie,Theory and Practice of Sample Surveys. Paris, France: Dunod, 1966 (in French).
[14] W. G. Cochran,Sampling Techniques. New York: Wiley, 1977.
[15] E. Martins,“Validation of distributed systems by fault injection,”Doctoral Dissertation, ENSAE, Toulouse, France, 1992 (in French).
[16] N. L. Johnson and S. Kotz,Distributions in Statistics—Discrete Distributions. New York: Wiley, 1969.

David Powell, Eliane Martins, Jean Arlat, Yves Crouzet, "Estimators for Fault Tolerance Coverage Evaluation," IEEE Transactions on Computers, vol. 44, no. 2, pp. 261-274, Feb. 1995, doi:10.1109/12.364537
Usage of this product signifies your acceptance of the Terms of Use.