This Article 
 Bibliographic References 
 Add to: 
Evaluation of Several Nonparametric Bootstrap Methods to Estimate Confidence Intervals for Software Metrics
November 2003 (vol. 29 no. 11)
pp. 996-1004

Abstract—Sample statistics and model parameters can be used to infer the properties, or characteristics, of the underlying population in typical data-analytic situations. Confidence intervals can provide an estimate of the range within which the true value of the statistic lies. A narrow confidence interval implies low variability of the statistic, justifying a strong conclusion made from the analysis. Many statistics used in software metrics analysis do not come with theoretical formulas to allow such accuracy assessment. The Efron bootstrap statistical analysis appears to address this weakness. In this paper, we present an empirical analysis of the reliability of several Efron nonparametric bootstrap methods in assessing the accuracy of sample statistics in the context of software metrics. A brief review on the basic concept of various methods available for the estimation of statistical errors is provided, with the stated advantages of the Efron bootstrap discussed. Validations of several different bootstrap algorithms are performed across basic software metrics in both simulated and industrial software engineering contexts. It was found that the 90 percent confidence intervals for mean, median, and Spearman correlation coefficients were accurately predicted. The 90 percent confidence intervals for the variance and Pearson correlation coefficients were typically underestimated (60-70 percent confidence interval), and those for skewness and kurtosis overestimated (98-100 percent confidence interval). It was found that the Bias-corrected and accelerated bootstrap approach gave the most consistent confidence intervals, but its accuracy depended on the metric examined. A method for correcting the under-/overestimation of bootstrap confidence intervals for small data sets is suggested, but the success of the approach was found to be inconsistent across the tested metrics.

[1] N. Fenton and L. Pfleeger, Software Metrics–A Rigorous and Practical Approach, second ed. Boston, PWS-Publishing, 1997.
[2] B. Efron and R. Tibshirani, An Introduction to the Bootstrap. New York: Chapman&Hall, 1993.
[3] K. Cho, P. Meer, and J. Cabrera,“Performance Assessment through Bootstrap,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 11, pp. 1,185-1,198, Nov. 1997.
[4] A. Zoubir and B. Boashash, The Bootstrap and its Application in Signal Processing IEEE Signal Processing Magazine, pp. 56-76, 1988.
[5] R. Cheng, Bootstrap Methods in Computer Simulation Experiments Proc.Winter Simulation Conf., 1995.
[6] A. Zoubir, Bootstrap: Theory and Applications Advanced Signal Processing Algorithms, Architectures and Implementations, vol. 20, pp. 216-235, 1993.
[7] L.C. Briand and J. Wust, The Impact of Design Properties on Development cost in Object-Oriented Systems Technical Report ISERN-99-16, 1999.
[8] L. Prechelt and B. Unger, An Experimental Measuring the Effects of Personal Software Process (PSP) Training IEEE Trans. Software Eng., vol. 27, pp. 465-472, 2001.
[9] J. Llorca and M. Delgado-Rodriguez, A Comparison of Several Procedures to Estimate the Confidence for Attributable Risk in Case-Control Studies Statistics in Medicine, vol. 19, pp. 1089-1099, 2000.
[10] J.A. Barber and S.G. Thompson, Analysis of Cost Data in Randomized Trials: An Application of the Nonparametric Bootstrap Statistics in Medicine, vol. 19, pp. 3219-3236, 2000.
[11] J. Carpenter and J. Bithell, Bootstrap Confidence Intervals: When, Which, What? A Practical Guide for Medical Statisticians Statistics in Medicine, vol. 19, pp. 1141-1164, 2000.
[12] M.T. Markus and R.A. Visser, Applying The Bootstrap to Generate Confidence Regions in Multiple Correspondence Analysis Proc. Bootstrapping and Relating Techniques Conf., 1990.
[13] B. Efron, Six Questions Raised by the Bootstrap. Wiley, 1992.
[14] B. Efron and R. Tibshirani, Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy Statistical Science, vol. 1, pp. 54-75, 1986.
[15] D. Hinkley, Bootstrap Methods and Their Applications. Cambridge Univ. Press, 1997.
[16] D.L. Harnett, Introduction to Statistical Methods. pp. 186-209, Philippines: Addison Wesley, 1970.
[17] R. Walpole and R. Myers, Probability and Statistics for Engineers and Scientists, third ed. pp. 259-315, New York: Macmillan Publishing Company, 1985.
[18] J. Shao and D. Tu, The Jackknife and Bootstrap. Springer, 1995.
[19] I. WebGain, WebGain Quality Analyzer http://www.webgain. com/productsquality_analyzer /, 2000.
[20] G. Succi, L. Benedicenti, C. Bonamico, and T. Vernazza, The Webmetrics Project Exploiting Software Tools on Demand Proc. World Multiconference on Systemics, Cybernetics, and Informatics, 1998.
[21] B. Littlewood, A Software Reliability Model for Modular Program Structure IEEE Trans. Reliability, vol. 28, pp. 241-246, 1979.
[22] A.L. Goel and K. Okumoto, Time-Dependent Error-Detection Rate Model for Software Reliability and Other Performance Measures IEEE Trans. Reliability, vol. 28, pp. 206-211, 1979.
[23] S. Lei, On the Application of the Efron Bootstrap for Accessing Confidence Measures on Software Metrics MSc thesis, Electrical and Computer Eng. Dept., Univ. of Calgary, Canada, 2001.
[24] S. Lei and M. Smith, Evaluation of Several Efron Bootstrap Methods to Estimate Error Measures for Software Metrics Proc. IEEE Canadian Conf. Computer Eng., 2002.

Index Terms:
Efron bootstrap, software metrics, confidence intervals, correction of possible biases in Efron bootstrap estimates.
Skylar Lei, Michael R. Smith, "Evaluation of Several Nonparametric Bootstrap Methods to Estimate Confidence Intervals for Software Metrics," IEEE Transactions on Software Engineering, vol. 29, no. 11, pp. 996-1004, Nov. 2003, doi:10.1109/TSE.2003.1245301
Usage of this product signifies your acceptance of the Terms of Use.