This Article 
 Bibliographic References 
 Add to: 
Coverage Estimation Methods for Stratified Fault-Injection
July 1999 (vol. 48 no. 7)
pp. 707-723

Abstract—This paper addresses the problem of estimating fault tolerance coverage through statistical processing of observations collected in fault-injection experiments. In an earlier paper, various estimators based on simple sampling in the complete fault/activity input space and stratified sampling in a partitioned space were studied; frequentist confidence limits were derived based on a normal approximation. In this paper, the validity of this approximation is analyzed. The theory of confidence regions is introduced to estimate coverage without approximation when stratification is used. Three statistics are considered for defining confidence regions. It is shown that one—a vectorial statistic—is often more conservative than the other two. However, only the vectorial statistic is computationally tractable. We then consider Bayesian estimation methods for stratified sampling. Two methods are presented to obtain an approximation of the posterior distribution of the coverage by calculating its moments. The moments are then used to identify the type of the distribution in the Pearson distribution system, to estimate its parameters, and to obtain the coverage confidence limit. Three hypothetical example systems are used to compare the validity and the conservatism of the frequentist and Bayesian estimations.

[1] M. Abramowitz and I.A. Stegun, Handbook of Mathematical Functions with Formulas, Graphs and Mathematical Tables. New York: Dover, 1972.
[2] J. Aitchison and I.R. Dunsmore, Statistical Prediction Analysis. Cambridge, U.K.: Cambridge Univ. Press, 1975.
[3] J. Arlat, A. Costes, Y. Crouzet, J.-C. Laprie, and D. Powell, Fault Injection and Dependability Evaluation of Fault-Tolerant Systems IEEE Trans. Computers, vol. 42, no. 8, pp. 913-923, Aug. 1993.
[4] W.G. Bouricius, W.C. Carter, D.C. Jessep, P.R. Schneider, and A.B. Wadia, “Reliability Modeling for Fault-Tolerant Computers,” IEEE Trans. Computers, vol. 20, no. 11, pp. 1,306-1,311, Nov. 1971.
[5] K.O. Bowman and L.R. Shenton, “Approximate Percentage Points for Pearson Distributions,” Biometrika, vol. 66, no. 1, pp. 147-155, 1979.
[6] R. Chillarege and N.S. Bowen, “Understanding Large System Failures—A Fault Injection Experiment,” Proc. IEEE Int'l Symp. Fault-Tolerant Computing, pp. 356–363, June 1989.
[7] G.S. Choi, R.K. Iyer, R. Saleh, and V. Carreno, “A Fault Behavior Model for an Avionic Microprocessor: a Case Study,” Dependable Computing for Critical Applications, A. Avizienis and J.-C. Laprie, eds., pp. 171-195, 1991.
[8] M. Cukier, “Estimation of the Coverage of Fault-Tolerant Systems,” doctoral dissertation, Nat'l Polytechnic Inst. Toulouse, France, July 1996 (in French).
[9] C.S. Davis and M.A. Stephens, “Approximate Percentage Points using Pearson Curves, Algorithm AS192,” Applied Statistics, vol. 32, pp. 322-327, 1983.
[10] M. Hsueh, T. Tsai, and R. Iyer, “Fault Injection Techniques and Tools,” Computer, pp. 75–82, Apr. 1997.
[11] N.L. Johnson and S. Kotz, Distributions in Statistics—Discrete Distributions. New York: John Wiley&Sons, 1969.
[12] N.L. Johnson and S. Kotz, Distributions in Statistics—Continuous Univariate Distributions-1. New York: John Wiley&Sons, 1970.
[13] N.L. Johnson and S. Kotz, Distributions in Statistics—Continuous Univariate Distributions-2. New York: John Wiley&Sons, 1970.
[14] J. Karlsson, P. Lidén, P. Dahlgren, R. Johansson, and U. Gunneflo, Using Heavy-Ion Radiation to Validate Fault-Handling Mechanisms IEEE Micro, vol. 14, no. 1, pp. 8-23, Feb. 1994.
[15] G.A. Kanawati, N.A. Kanawati, and J.A. Abraham, FERRARI: A Flexible Software-Based Fault and Error Injection System IEEE Trans. Computers, vol. 44, no. 2, pp. 248-260, Feb. 1995.
[16] D. Powell, E. Martins, J. Arlat, and Y. Crouzet, “Estimators for Fault Tolerance Coverage Evaluation,” Proc. 23rd Int'l Symp. Fault-Tolerant Computing (FTCS-23), pp. 228-237, Toulouse, France, 1993 (extended version in IEEE Trans. Computers, vol. 44, no. 2, pp. 347-366, Feb. 1995).
[17] D.A. Rennels and A. Avizienis, “RMS: A Reliability Modeling System for Self-Repairing Computers,” Proc. Third Int'l Symp. Fault-Tolerant Computing (FTCS-3), pp. 131-135, Palo Alto, Calif., 1973.
[18] Z. Segall et al., “FIAT—Fault Injection Based Automated Testing Environment,” Proc. IEEE Int'l Symp. Fault-Tolerant Computing, pp. 102–107, 1988.
[19] A. Stuart and J.K. Ord, Distribution Theory, Kendall's Advanced Theory of Statistics, 1. London: Edward Ar nold, 1987.
[20] C.J. Walter, Evaluation and Design of an Ultra-Reliable Distributed Architecture for Fault Tolerance IEEE Trans. Reliability, vol. 39, no. 4, pp. 492-499, Oct. 1990.
[21] W. Wang, K.S. Trivedi, B.V. Shah, and J.A. Profeta III, “The Impact of Fault Expansion on the Interval Estimate for Fault Detection Coverage,” Proc. 24th Int'l Symp. Fault-Tolerant Computing (FTCS-24), pp. 330-337, Austin, Tex., June 1994.

Index Terms:
Fault tolerance coverage, coverage estimation, fault-injection, stratified sampling, confidence regions, confidence limits, frequentist estimation, Bayesian estimation.
Michel Cukier, David Powell, Jean Arlat, "Coverage Estimation Methods for Stratified Fault-Injection," IEEE Transactions on Computers, vol. 48, no. 7, pp. 707-723, July 1999, doi:10.1109/12.780878
Usage of this product signifies your acceptance of the Terms of Use.