This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Software Defect Association Mining and Defect Correction Effort Prediction
February 2006 (vol. 32 no. 2)
pp. 69-82
Much current software defect prediction work focuses on the number of defects remaining in a software system. In this paper, we present association rule mining based methods to predict defect associations and defect correction effort. This is to help developers detect software defects and assist project managers in allocating testing resources more effectively. We applied the proposed methods to the SEL defect data consisting of more than 200 projects over more than 15 years. The results show that, for defect association prediction, the accuracy is very high and the false-negative rate is very low. Likewise, for the defect correction effort prediction, the accuracy for both defect isolation effort prediction and defect correction effort prediction are also high. We compared the defect correction effort prediction method with other types of methods—PART, C4.5, and Naïve Bayes—and show that accuracy has been improved by at least 23 percent. We also evaluated the impact of support and confidence levels on prediction accuracy, false-negative rate, false-positive rate, and the number of rules. We found that higher support and confidence levels may not result in higher prediction accuracy, and a sufficient number of rules is a precondition for high prediction accuracy.

[1] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases,” Proc. ACM SIGMOD Conf. Management of Data, May 1993.
[2] K. Ali, S. Manganaris, and R. Srikant, “Partial Classification Using Association Rules,” Proc. Third Int'l Conf. Knowledge Discovery and Data Mining, pp. 115-118, 1997.
[3] I.S. Bhandari, “Attribute Focusing: Machine-Assisted Knowledge Discovery Applied to Software Production Process Control,” Proc. Workshop Knowledge Discovery in Databases, July 1993.
[4] I.S. Bhandari, M. Halliday, E. Tarver, D. Brown, J. Chaar, and R. Chillarege, “A Case Study of Software Process Improvement During Development,” IEEE Trans. Software Eng., vol. 19, no. 12, pp. 1157-1170, 1993.
[5] I.S. Bhandari, M.J. Halliday, J. Chaar, R. Chillarenge, K. Jones, J.S. Atkinson, C. Lepori-Costello, P.Y. Jasper, E.D. Tarver, C.C. Lewis, and M. Yonezawa, “In Process Improvement through Defect Data Interpretation,” IBM System J., vol. 33, no. 1, pp. 182-214, 1994.
[6] L.C. Briand, K. El-Emam, B.G. Freimut, and O. Laitenberger, “A Comprehensive Evaluation of Capture-Recapture Models for Estimating Software Defect Content,” IEEE Trans. Software Eng., vol. 26, no. 6, pp. 518-540, June 2000.
[7] T. Compton and C. Withrow, “Prediction and Control of Ada Software Defects,” J. Systems and Software, vol. 12, pp. 199-207, 1990.
[8] G. Dong, X. Zhang, L. Wong, and J. Li, “CAEP: Classification by Aggregating Emerging Patterns,” Proc. Second Int'l Conf. Discovery Science, pp. 30-42, 1999.
[9] N.B. Ebrahimi, “On the Statistical Analysis of the Number of Errors Remaining in a Software Design Document after Inspection,” IEEE Trans. Software Eng., vol. 23, no. 8, pp. 529-532, 1997.
[10] K. El-Emam and O. Laitenberger, “Evaluating Capture-Recapture Models with Two Inspectors,” IEEE Trans. Software Eng., vol. 27, no. 9, pp. 851-864, 2001.
[11] N.E. Fenton and M. Neil, “A Critique of Software Defect Prediction Models,” IEEE Trans. Software Eng., vol. 25, no. 5, pp. 676-689, 1999.
[12] N.E. Fenton and S.L. Pfleeger, Software Metrics: A Rigorous and Practical Approach, second ed. Int'l Thomson Computer Press, 1996.
[13] E. Frank, L. Trigg, G. Holmes, and I.H. Witten, “Naïve Bayes for Regression,” Machine Learning, vol. 41, no. 1, pp. 5-25, 2000.
[14] E. Frank and I.H. Witten, “Generating Accurate Rule Sets without Global Optimization,” Proc. 15th Int'l Conf. Machine Learning, pp. 144-151, 1998.
[15] G. Heller, J. Valett, and M. Wild, “Data Collection Procedure for the Software Engineering Laboratory (SEL) Database,” Technical Report SEL-92-002, Software Eng. Laboratory, 1992.
[16] G.Q. Kenney, “Estimating Defects in Commercial Software During Operational Use,” IEEE Trans. Reliability, vol. 42, no. 1, pp. 107-115, 1993.
[17] B. Liu, W. Hsu, and Y. Ma, “Integrating Classification and Association Rule Mining,” Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining, pp. 80-86, 1998.
[18] J.C. Munson and T.M. Khoshgoftaar, “Regression Modelling of Software Quality: An Empirical Investigation,” Information and Software Technology, vol. 32, no. 2, pp. 106-114, 1990.
[19] F. Padberg, T. Ragg, and R. Schoknecht, “Using Machine Learning for Estimating the Defect Content after an Inspection,” IEEE Trans. Software Eng., vol. 30, no. 1, pp. 17-28, 2004.
[20] J.R. Quinlan, C4.5 Programs for Machine Learning. San Mateo, Calif.: Morgan Kaufmann, 1993.
[21] P. Runeson and C. Wohlin, “An Experimental Evaluation of an Experience-Based Capture-Recapture Method in Software Code Inspections,” Empirical Software Eng., vol. 3, no. 3, pp. 381-406, 1998.
[22] SEL Database, http://www.cebase.org/www/frames.html?/www/ ResourcesSEL/, 2005.
[23] R. She, F. Chen, K. Wang, M. Ester, J.L. Gardy, and F.L. Brinkman, “Frequent-Subsequence-Based Prediction of Outer Membrane Proteins,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2003.
[24] R. Srikant, Q. Vu, and R. Agrawal, “Mining Association Rules with Item Constraints,” Proc. Third Int'l Conf. Knowledge Discovery and Data Mining (KDD '97), pp. 67-73, Aug. 1997.
[25] K. Wang, S.Q. Zhou, and S.C. Liew, “Building Hierarchical Classifiers Using Class Proximity,” Proc. 25th Int'l Conf. Very Large Data Bases, pp. 363-374. 1999.
[26] K. Wang, S. Zhou, and Y. He, “Growing Decision Tree on Support-Less Association Rules,” Proc. Sixth Int'l Conf. Knowledge Discovery and Data Mining, 2000.
[27] S.V. Wiel and L. Votta, “Assessing Software Designs Using Capture-Recapture Methods,” IEEE Trans. Software Eng. , vol. 19, no. 11, pp. 1045-1054 1993
[28] C. Wohlin and P. Runeson, “Defect Content Estimations from Review Data,” Proc. 20th Int'l Conf. Software Eng., pp. 400-409, 1998.
[29] Q. Yang, H.H. Zhang, and T. Li, “Mining Web Logs for Prediction Models in WWW Caching and Prefetching,” Proc. Seventh ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, Aug. 2001.
[30] X. Yin and J. Han, “CPAR: Classification Based on Predictive Association Rules,” Proc. 2003 SIAM Int'l Conf. Data Mining, May 2003.
[31] A.T.T. Ying, C.G. Murphy, R. Ng, and M.C. Chu-Carroll, “Predicting Source Code Changes by Mining Revision History,” Proc. First Int'l Workshop Mining Software Repositories, 2004.
[32] T. Zimmermann, P. Weigerber, S. Diehl, and A. Zeller, “Mining Version Histories to Guide Software Changes,” Proc. 26th Int'l Conf. Software Eng., 2004.

Index Terms:
Software defect prediction, defect association, defect isolation effort, defect correction effort.
Citation:
Qinbao Song, Martin Shepperd, Michelle Cartwright, Carolyn Mair, "Software Defect Association Mining and Defect Correction Effort Prediction," IEEE Transactions on Software Engineering, vol. 32, no. 2, pp. 69-82, Feb. 2006, doi:10.1109/TSE.2006.19
Usage of this product signifies your acceptance of the Terms of Use.