This Article 
 Bibliographic References 
 Add to: 
A Comprehensive Evaluation of Capture-Recapture Models for Estimating Software Defect Content
June 2000 (vol. 26 no. 6)
pp. 518-540

Abstract—An important requirement to control the inspection of software artifacts is to be able to decide, based on more objective information, whether the inspection can stop or whether it should continue to achieve a suitable level of artifact quality. A prediction of the number of remaining defects in an inspected artifact can be used for decision making. Several studies in software engineering have considered capture-recapture models, originally proposed by biologists to estimate animal populations, to make a prediction. However, few studies compare the actual number of remaining defects to the one predicted by a capture-recapture model on real software engineering artifacts. Thus, there is little work looking at the robustness of capture-recapture models under realistic software engineering conditions, where it is expected that some of their assumptions will be violated. Simulations have been performed, but no definite conclusions can be drawn regarding the degree of accuracy of such models under realistic inspection conditions and the factors affecting this accuracy. Furthermore, the existing studies focused on a subset of the existing capture-recapture models. Thus, a more exhaustive comparison is still missing. In this study, we focus on traditional inspections and estimate, based on actual inspections data, the degree of accuracy of relevant, state-of-the-art capture-recapture models as they have been proposed in biology and for which statistical estimators exist. In order to assess their robustness, we look at the impact of the number of inspectors and the number of actual defects on the estimators' accuracy based on actual inspection data. Our results show that models are strongly affected by the number of inspectors and, therefore, one must consider this factor before using capture-recapture models. When the number of inspectors is too small, no model is sufficiently accurate and underestimation may be substantial. In addition, some models perform better than others in a large number of conditions and plausible reasons are discussed. Based on our analyses, we recommend using a model taking into account that defects have different probabilities of being detected and the corresponding Jackknife Estimator. Furthermore, we attempt to calibrate the prediction models based on their relative error, as previously computed on other inspections. Although intuitive and straightforward, we identified theoretical limitations to this approach which were then confirmed by the data.

[1] A.F. Ackerman, L.S. Buchwald, and F.H. Lewski, "Software Inspections: An Effective Verification Process," IEEE Software, pp. 31-36, May 1989.
[2] V. Basili, S. Green, O. Laitenberger, F. Lanubile, F. Shull, S. Sørumgård, and M.V. Zelkowitz, “The Empirical Investigation of Perspective-Based Reading,” Empirical Software Eng.: An Int'l J., vol. 1, no. 2, pp. 133–164, 1996.
[3] S.L. Basin, “Estimation of Software Error Rates via Capture-Recapture Sampling,” Technical eport, Science Applications, 1973.
[4] M. Begon, Investigating Animal Abundance. Edward Arnold Publishers, 1979.
[5] D.B. Bisant and J.R. Lyle, A Two-Person Inspection Method to Improve Programming Productivity IEEE Trans. Software Eng., vol. 15, no. 10, pp. 1294-1304, Oct. 1989.
[6] K.V. Bourgeois, “Process Insights from a Large-Scale Software Inspection Data Analysis,” Crosstalk, The J. Defense Software Eng., pp. 17–23, 1996.
[7] L. Briand, K. El Emam, T. Fussbroich, and O. Laitenberger, “Using Simulation to Build Inspection Efficiency Benchmarks for Development Projects,” Proc. 20th Int'l Conf. Software Eng., pp. 340–349, 1998.
[8] L. Briand, K. El Emam, and B. Freimut, "A Comparison and Integration of Capture-Recapture Models and the Detection Profile Method," Proc. Ninth Int'l Symp. Software Reliability Eng., IEEE Computer Soc. Press, Los Alamitos, Calif., 1998.
[9] K.P. Burnham and W.S. Overton, “Estimation of the Size of a Closed Population when Capture Probabilities Vary Among Animals,” Biometrika, vol. 65, pp. 625–633, 1978.
[10] A. Chao, “Estimating the Population Size for Capture-Recapture Data with Unequal Catchability,” Biometrics, vol. 43, pp. 783–791, 1987.
[11] A. Chao, “Estimating Population Size for Sparse Data in Capture-Recapture Experiments,” Biometrics, vol. 45, pp. 427–438, 1989.
[12] A. Chao, S.M. Lee, and S.L Jeng, “Estimation Population Size for Capture-Recapture Data when Capture Probabilities Vary by Time and Individual Animal,” Biometrics, vol. 48, pp. 201–216, 1992.
[13] E. Doolan, “Experience with Fagan's Inspection Method,” Software—Practice and Experience, vol. 22, no. 2, pp. 173–182, 1992.
[14] E. Dudewicz, Modern Mathematical Statistics. John Wiley&Sons, 1988.
[15] N.B. Ebrahimi, On the Statistical Analysis of the Number of Errors Remaining in a Software Design Document After Inspection IEEE Trans. Software Eng., vol. 23, no. 8, pp. 529-532, Aug. 1997.
[16] S.G. Eick, C.R. Loader, M.D. Long, S.A. Vander Wiel, and L.G. Votta, "Estimating Software Fault Content Before Coding," Proc. 14th Int'l Conf. Software Eng., pp. 59-65, May 1992.
[17] S. Eick, C. Loader, S. Vander Wiel, and L. Votta, “How Many Errors Remain in a Software Design After Inspection?,” Proc. 25th Symp. Interface. 1993.
[18] K. El-Emam, O. Laitenberger, and T. Harbich, “The Application of Subjective Estimates of Effectiveness to Controlling Software Inspections,” J. Systems and Software, vol. 54, pp. 119-136, 2000.
[19] M. Fagan, “Design and Code Inspections to Reduce Errors in Program Development.,” IBM Systems J., vol. 15, no. 3, pp. 182–211, 1976.
[20] T. Gilb and D. Graham, Software Inspection, Addison-Wesley, 1993.
[21] IEEE Standards Collection—Software Engineering, Std 830-1993, 1994.
[22] C. Jones, “Software Defect Removal Efficiency,” Computer, vol. 29, no. 4, pp. 94–95, Apr. 1996.
[23] O. Laitenberger and J.-M. DeBaud, ”Perspective-Based Reading of Code Documents at Robert Bosch GmbH,” Information and Software Technology. vol. 39, pp. 781–791, 1997.
[24] L. Lapin, Statistics for Modern Business Decisions, Second edition, Hartcourt, Brace,&Jova novich, 1978.
[25] J. Miller, Estimating the Number of Remaining Defects after Inspection, Technical report ISERN-98-24, Int'l Software Eng. Network, Univ. of Strathclyde, 1998.
[26] H. Mills, “On the Statistical Validation of Computer Programs,” Technical report FSC-72-6015, IBM Federal Systems Division, 1972.
[27] J.D. Musa,A. Iannino,, and K. Okumoto,Software Reliability: Measurement, Prediction and Application.New York: McGraw-Hill, 1987.
[28] D. Otis, K. Burnham, G. White, and D. Anderson, “Statistical Inference from Capture Data on Closed Animal Populations,” Wildlife Monographs, vol. 62, pp. 1–135, 1978.
[29] A. Porter and L. Votta, ”Understanding the Sources of Variation in Software Inspections,” ACM Trans. Software Eng. and Methodology, vol. 7, no. 1, pp. 41–79, Jan. 1998.
[30] K. Pollock, “Modeling Capture, Recapture, and Removal Statistics for Estimation of Demographic Parameters: Past, Present, and Future,” J. Am. Statistical Assoc., vol. 86, no. 413, pp. 225–238, 1991.
[31] J. Rice, Mathematical Statistics and Data Analysis. Duxbury Press, 1987.
[32] P. Runeson and C. Wohlin, “An Experimental Evaluation of an Experience-Based Capture-Recapture Method in Software Code Inspections,” Empirical Software Eng.: An Int'l J., vol. 3, pp. 381-406, 1998.
[33] G. Schick and R. Wolverton, “An Analysis of Competing Software Reliability Models,” IEEE Trans. Software Eng., vol. 4, no. 2, pp. 104–120, 1978.
[34] G. Seber, The Estimation of Animal Abundance and Related Parameters, Second edition., Charles Griffin&Company Ltd., 1982.
[35] S. Siegel and J.N. Castellan, Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill, 1988.
[36] S. Strauss and R. Ebenau, Software Inspection Process. McGraw-Hill, 1994.
[37] E.F. Weller, "Lessons from Three Years of Inspection Data," IEEE Software, pp. 38-45, Sept. 1993.
[38] G. White, D. Anderson, K. Burnham, and D. Otis, “Capture-Recapture and Removal Methods for Sampling Closed Populations,” Technical report, Los Alamos Nat. Laboratory, 1982.
[39] S.A. Vander Wiel and L.G. Votta, "Assessing Software Design Using Capture-Recapture Methods." IEEE Trans. Software Eng., vol. 19, pp. 1,045-1,054, 1993.
[40] C. Wohlin, P. Runeson, and J. Brantestam, “An Experimental Evaluation of Capture-Recapture in Software Inspections,” Software Testing, Verification, and Reliability, vol. 5, pp. 213–232, 1995.
[41] C. Wohlin and P. Runeson, “Defect Content Estimations from Review Data,” Proc. 1998 Int'l Conf. Software Eng., pp. 400-409, 1998.
[42] R. Zwick, “Testing Pairwise Contrasts in One-Way Analysis of Variance Designs,” Psyschoneuroendocrinology, vol. 11, no. 3, pp. 253–276, 1986.

Index Terms:
Inspections, capture-recapture models, robustness, fault content estimation.
Lionel C. Briand, Khaled El Emam, Bernd G. Freimut, Oliver Laitenberger, "A Comprehensive Evaluation of Capture-Recapture Models for Estimating Software Defect Content," IEEE Transactions on Software Engineering, vol. 26, no. 6, pp. 518-540, June 2000, doi:10.1109/32.852741
Usage of this product signifies your acceptance of the Terms of Use.