This Article 
 Bibliographic References 
 Add to: 
A General Software Defect-Proneness Prediction Framework
May/June 2011 (vol. 37 no. 3)
pp. 356-370
Qinbao Song, Xi'an Jiaotong University, Xi'an
Zihan Jia, Xi'an Jiaotong University, Xi'an
Martin Shepperd, Brunel University, Uxbridge
Shi Ying, Wuhan University, Wuhan
Jin Liu, Wuhan University, Wuhan
BACKGROUND—Predicting defect-prone software components is an economically important activity and so has received a good deal of attention. However, making sense of the many, and sometimes seemingly inconsistent, results is difficult. OBJECTIVE—We propose and evaluate a general framework for software defect prediction that supports 1) unbiased and 2) comprehensive comparison between competing prediction systems. METHOD—The framework is comprised of 1) scheme evaluation and 2) defect prediction components. The scheme evaluation analyzes the prediction performance of competing learning schemes for given historical data sets. The defect predictor builds models according to the evaluated learning scheme and predicts software defects with new data according to the constructed model. In order to demonstrate the performance of the proposed framework, we use both simulation and publicly available software defect data sets. RESULTS—The results show that we should choose different learning schemes for different data sets (i.e., no scheme dominates), that small details in conducting how evaluations are conducted can completely reverse findings, and last, that our proposed framework is more effective and less prone to bias than previous approaches. CONCLUSIONS—Failure to properly or fully evaluate a learning scheme can be misleading; however, these problems may be overcome by our proposed framework.

[1] B.T. Compton and C. Withrow, "Prediction and Control of ADA Software Defects," J. Systems and Software, vol. 12, no. 3, pp. 199-207, 1990.
[2] J. Munson and T.M. Khoshgoftaar, "Regression Modelling of Software Quality: Empirical Investigation," J. Electronic Materials, vol. 19, no. 6, pp. 106-114, 1990.
[3] N.B. Ebrahimi, "On the Statistical Analysis of the Number of Errors Remaining in a Software Design Document After Inspection," IEEE Trans. Software Eng., vol. 23, no. 8, pp. 529-532, Aug. 1997.
[4] S. Vander Wiel and L. Votta, "Assessing Software Designs Using Capture-Recapture Methods," IEEE Trans. Software Eng., vol. 19, no. 11, pp. 1045-1054, Nov. 1993.
[5] P. Runeson and C. Wohlin, "An Experimental Evaluation of an Experience-Based Capture-Recapture Method in Software Code Inspections," Empirical Software Eng., vol. 3, no. 4, pp. 381-406, 1998.
[6] L.C. Briand, K. El Emam, B.G. Freimut, and O. Laitenberger, "A Comprehensive Evaluation of Capture-Recapture Models for Estimating Software Defect Content," IEEE Trans. Software Eng., vol. 26, no. 6, pp. 518-540, June 2000.
[7] K. El Emam and O. Laitenberger, "Evaluating Capture-Recapture Models with Two Inspectors," IEEE Trans. Software Eng., vol. 27, no. 9, pp. 851-864, Sept. 2001.
[8] C. Wohlin and P. Runeson, "Defect Content Estimations from Review Data," Proc. 20th Int'l Conf. Software Eng., pp. 400-409, 1998.
[9] G.Q. Kenney, "Estimating Defects in Commercial Software during Operational Use," IEEE Trans. Reliability, vol. 42, no. 1, pp. 107-115, Mar. 1993.
[10] F. Padberg, T. Ragg, and R. Schoknecht, "Using Machine Learning for Estimating the Defect Content After an Inspection," IEEE Trans. Software Eng., vol. 30, no. 1, pp. 17-28, Jan. 2004.
[11] N.E. Fenton and M. Neil, "A Critique of Software Defect Prediction Models," IEEE Trans. Software Eng., vol. 25, no. 5, pp. 675-689, Sept./Oct. 1999.
[12] Q. Song, M. Shepperd, M. Cartwright, and C. Mair, "Software Defect Association Mining and Defect Correction Effort Prediction," IEEE Trans. Software Eng., vol. 32, no. 2, pp. 69-82, Feb. 2006.
[13] A. Porter and R. Selby, "Empirically Guided Software Development Using Metric-Based Classification Trees," IEEE Software, vol. 7, no. 2, pp. 46-54, Mar. 1990.
[14] J.C. Munson and T.M. Khoshgoftaar, "The Detection of Fault-Prone Programs," IEEE Trans. Software Eng., vol. 18, no. 5, pp. 423-433, May 1992.
[15] V.R. Basili, L.C. Briand, and W.L. Melo, "A Validation of Object-Oriented Design Metrics as Quality Indicators," IEEE Trans. Software Eng., vol. 22, no. 10, pp. 751-761, Oct. 1996.
[16] T.M. Khoshgoftaar, E.B. Allen, J.P. Hudepohl, and S.J. Aud, "Application of Neural Networks to Software Quality Modeling of a Very Large Telecommunications System," IEEE Trans. Neural Networks, vol. 8, no. 4, pp. 902-909, July 1997.
[17] T.M. Khoshgoftaar, E.B. Allen, W.D. Jones, and J.P. Hudepohl, "Classification Tree Models of Software Quality over Multiple Releases," Proc. 10th Int'l Symp. Software Reliability Eng., pp. 116-125, 1999.
[18] K. Ganesan, T.M. Khoshgoftaar, and E. Allen, "Case-Based Software Quality Prediction," Int'l J. Software Eng. and Knowledge Eng., vol. 10, no. 2, pp. 139-152, 2000.
[19] K. El Emam, S. Benlarbi, N. Goel, and S.N. Rai, "Comparing Case-Based Reasoning Classifiers for Predicting High Risk Software Components," J. Systems and Software, vol. 55, no. 3, pp. 301-320, 2001.
[20] L. Zhan and M. Reformat, "A Practical Method for the Software Fault-Prediction," Proc. IEEE Int'l Conf. Information Reuse and Integration, pp. 659-666, 2007.
[21] T.M. Khoshgoftaar and N. Seliya, "Analogy-Based Practical Classification Rules for Software Quality Estimation," Empirical Software Eng., vol. 8, no. 4, pp. 325-350, 2003.
[22] L. Guo, Y. Ma, B. Cukic, and H. Singh, "Robust Prediction of Fault-Proneness by Random Forests," Proc. 15th Int'l Symp. Software Reliability Eng., pp. 417-428, 2004.
[23] T. Menzies, J. Greenwald, and A. Frank, "Data Mining Static Code Attributes to Learn Defect Predictors," IEEE Trans. Software Eng., vol. 33, no. 1, pp. 2-13, Jan. 2007.
[24] S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, "Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings," IEEE Trans. Software Eng., vol. 34, no. 4, pp. 485-496, July/Aug. 2008.
[25] I. Myrtveit, E. Stensrud, and M. Shepperd, "Reliability and Validity in Comparative Studies of Software Prediction Models," IEEE Trans. Software Eng., vol. 31, no. 5, pp. 380-391, May 2005.
[26] K. Srinivasan and D. Fisher, "Machine Learning Approaches to Estimating Development Effort," IEEE Trans. Software Eng., vol. 21, no. 2, pp. 126-137, Feb. 1995.
[27] J. Tian and M. Zelkowitz, "Complexity Measure Evaluation and Selection," IEEE Trans. Software Eng., vol. 21, no. 8, pp. 641-649, Aug. 1995.
[28] M.A. Hall, "Correlation Based Attribute Selection for Discrete and Numeric Class Machine Learning," Proc. 17th Int'l Conf. Machine Learning, pp. 359-366, 2000.
[29] S. Wagner, "A Literature Survey of the Quality Economics of Defect-Detection Techniques," Proc. ACM/IEEE Int'l Symp. Empirical Software Eng., pp. 194-203, 2006.
[30] P. Runeson, C. Andersson, T. Thelin, A. Andrews, and T. Berling, "What Do We Know about Defect Detection Methods?" IEEE Software, vol. 23, no. 3, pp. 82-90, May/June 2006.
[31] J. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[32] M. Hall and G. Holmes, "Benchmarking Attribute Selection Techniques for Discrete Class Data Mining," IEEE Trans. Knowledge and Data Eng., vol. 15, no. 6, pp. 1437-1447, Nov./Dec. 2003.
[33] C. Cardie, "Using Decision Trees to Improve Case-Based Learning," Proc. 10th Int'l Conf. Machine Learning, pp. 25-32, 1993.
[34] M. Kubat, D. Flotzinger, and G. Pfurtscheller, "Discovering Patterns in EEG-Signals: Comparative Study of a Few Methods," Proc. European Conf. Machine Learning, pp. 366-371, 1993.
[35] D. Kibler and D.W. Aha, "Learning Representative Exemplars of Concepts: An Initial Case Study," Proc. Fourth Int'l Workshop Machine Learning, pp. 24-30, 1987.
[36] R. Kohavi and G. John, "Wrappers for Feature Selection for Machine Learning," Artificial Intelligence, vol. 97, pp. 273-324, 1997.
[37] M. Chapman, P. Callis, and W. Jackson, "Metrics Data Program," technical report, NASA IV and V Facility, 2004.
[38] K.O. Elish and M.O. Elish, "Predicting Defect-Prone Software Modules Using Support Vector Machines," J. Systems and Software, vol. 81, no. 5, pp. 649-660, 2008.
[39] B. Turhan and A. Bener, "Analysis of Naive Bayes Assumptions on Software Fault Data: An Empirical Study," Data & Knowledge Eng., vol. 68, no. 2, pp. 278-290, 2009.
[40] Y. Jiang, B. Cukic, and Y. Ma, "Techniques for Evaluating Fault Prediction models," Empirical Software Eng., vol. 13, pp. 561-595, 2008.
[41] H. Zhang and X. Zhang, "Comments on 'Data Mining Static Code Attributes to Learn Defect Predictors'," IEEE Trans. Software Eng., vol. 33, no. 9, pp. 635-636, Sept. 2007.
[42] T. Menzies, A. Dekhtyar, J. Distefano, and J. Greenwald, "Problems with Precision: A Response to Comments on Data Mining Static Code Attributes to Learn Defect Predictors," IEEE Trans. Software Eng., vol. 33, no. 9, pp. 637-640, Sept. 2007.
[43] A. Oral and A. Bener, "Defect Prediction for Embedded Software," Proc. 22nd Int'l Symp. Computer and Information Sciences, pp. 1-6, 2007.
[44] B. Turhan and A. Bener, "Software Defect Prediction: Heuristics for Weighted Naive Bayes," Proc. Seventh Int'l Conf. Quality Software, pp. 231-237, 2007.

Index Terms:
Software defect prediction, software defect-proneness prediction, machine learning, scheme evaluation.
Qinbao Song, Zihan Jia, Martin Shepperd, Shi Ying, Jin Liu, "A General Software Defect-Proneness Prediction Framework," IEEE Transactions on Software Engineering, vol. 37, no. 3, pp. 356-370, May-June 2011, doi:10.1109/TSE.2010.90
Usage of this product signifies your acceptance of the Terms of Use.