This Article 
 Bibliographic References 
 Add to: 
Using Machine Learning for Estimating the Defect Content After an Inspection
January 2004 (vol. 30 no. 1)
pp. 17-28

Abstract—We view the problem of estimating the defect content of a document after an inspection as a machine learning problem: The goal is to learn from empirical data the relationship between certain observable features of an inspection (such as the total number of different defects detected) and the number of defects actually contained in the document. We show that some features can carry significant nonlinear information about the defect content. Therefore, we use a nonlinear regression technique, neural networks, to solve the learning problem. To select the best among all neural networks trained on a given data set, one usually reserves part of the data set for later cross-validation; in contrast, we use a technique which leaves the full data set for training. This is an advantage when the data set is small. We validate our approach on a known empirical inspection data set. For that benchmark, our novel approach clearly outperforms both linear regression and the current standard methods in software engineering for estimating the defect content, such as capture-recapture. The validation also shows that our machine learning approach can be successful even when the empirical inspection data set is small.

[1] V.R. Basili, S. Green, O. Laitenberger, F. Lanubile, F. Shull, S. Sørumgård, and M.V. Zelkowitz, The Empirical Investigation of Perspective-Based Reading Empirical Software Eng., vol. 1, no. 2, pp. 133-164, 1996.
[2] S. Biffl and W. Grossmann, Evaluating the Accuracy of Defect Estimation Models Based on Inspection Data from Two Inspection Cycles Proc. Int'l Conf. Software Eng. ICSE, vol. 23, pp. 145-154, 2001.
[3] C.M. Bishop, Neural Networks for Pattern Recognition. Oxford Press, 1995.
[4] L. Briand, K. El Emam, and B. Freimut, "A Comparison and Integration of Capture-Recapture Models and the Detection Profile Method," Proc. Ninth Int'l Symp. Software Reliability Eng., IEEE Computer Soc. Press, Los Alamitos, Calif., 1998.
[5] L. Briand, K. El Emam, B. Freimut, and O. Laitenberger, “A Comprehensive Evaluation of Capture-Recapture Models for Estimating Software Defect Content,” IEEE Trans. Software Eng., vol. 26, no. 6, pp. 518-540, June 2000.
[6] T.M. Cover and J.A. Thomas, Elements of Information Theory. Wiley, 1991.
[7] N.B. Ebrahimi, On the Statistical Analysis of the Number of Errors Remaining in a Software Design Document After Inspection IEEE Trans. Software Eng., vol. 23, no. 8, pp. 529-532, Aug. 1997.
[8] S.G. Eick, C.R. Loader, M.D. Long, S.A. Vander Wiel, and L.G. Votta, "Estimating Software Fault Content Before Coding," Proc. 14th Int'l Conf. Software Eng., pp. 59-65, May 1992.
[9] N.O.E. Fenton and M. Neil, “A Critique of Software Defect Prediction Models,” IEEE Trans. Software Eng., vol. 25, no. 5, pp. 675-689, Sept./Oct. 1999.
[10] T. Gilb and D. Graham, Software Inspection. Addison Wesley, 1993.
[11] N. Karunanithi, D. Whitley, and Y.K. Malaiya, Prediction of Software Reliability Using Connectionist Models IEEE Trans. Software Eng., vol. 18, no. 7, pp. 563-574, July 1992.
[12] T.M. Khoshgoftaar, A.S. Pandya, and H.B. More, A Neural Network Approach for Predicting Software Development Faults Proc. Int'l Symp. Software Reliability Eng. ISSRE, vol. 3, pp. 83-89, 1992.
[13] T.M. Khoshgoftaar and R.M. Szabo, Using Neural Networks to Predict Software Faults During Testing IEEE Trans. Reliability, vol. 45, no. 3, pp. 456-462, 1996.
[14] F. Lanubile and G. Visaggio, Evaluating Predictive Quality Models Derived from Software Measures: Lessons Learned J. Systems and Software, vol. 38, pp. 225-234, 1997.
[15] D.J.C. MacKay, A Practical Bayesian Framework for Backpropagation Networks Neural Computation, vol. 4, no. 3, pp. 448-472, 1992.
[16] F. Padberg, Empirical Interval Estimates for the Defect Content after an Inspection Proc. Int'l Conf. Software Eng. ICSE, vol. 24, pp. 58-68, 2002.
[17] T. Ragg, Bayesian Learning and Evolutionary Parameter Optimization Proc. KI 2001: Advances in Artifical Intelligence, pp. 48-62, 2001.
[18] T. Ragg, W. Menzel, W. Baum, and M. Wigbers, Bayesian Learning for Sales Rate Prediction for Thousands of Retailers Neurocomputing, vol. 43, pp. 127-144, 2002.
[19] T. Ragg, F. Padberg, and R. Schoknecht, Applying Machine Learning to Solve an Estimation Problem in Software Inspections Proc. Int'l Conf. Artificial Neural Networks ICANN, pp. 516-521, 2002.
[20] M. Riedmiller, Supervised Learning in Multilayer Perceptrons From Backpropagation to Adaptive Learning Techniques Int'l J. Computer Standards and Interfaces, vol. 16, pp. 265-278, 1994.
[21] D.E. Rumelhart, G.E. Hinton, and R.S. Williams, Learning Representations by Back-Propagating Errors Nature, vol. 323, pp. 533-536, 1986.
[22] P. Runeson and C. Wohlin, An Experimental Evaluation of an Experience-Based Capture-Recapture Method in Software Code Inspections Empirical Software Eng., vol. 3, no. 3, pp. 381-406, 1998.
[23] B.W. Silverman, Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.
[24] S.A. Vander Wiel and L.G. Votta, "Assessing Software Design Using Capture-Recapture Methods." IEEE Trans. Software Eng., vol. 19, pp. 1,045-1,054, 1993.
[25] C. Wohlin and P. Runeson, “Defect Content Estimations from Review Data,” Proc. 1998 Int'l Conf. Software Eng., pp. 400-409, 1998.

Index Terms:
Defect content estimation, software inspections, nonlinear regression, neural networks, empirical methods.
Frank Padberg, Thomas Ragg, Ralf Schoknecht, "Using Machine Learning for Estimating the Defect Content After an Inspection," IEEE Transactions on Software Engineering, vol. 30, no. 1, pp. 17-28, Jan. 2004, doi:10.1109/TSE.2004.1265733
Usage of this product signifies your acceptance of the Terms of Use.