Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings
Issue No. 04 - July/August (2008 vol. 34)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TSE.2008.35
Stefan Lessmann , University of Hamburg, Hamburg
Bart Baesens , K.U.Leuven, Leuven
Christophe Mues , University of Southampton, Southampton
Swantje Pietsch , University of Hamburg, Hamburg
Software defect prediction strives to improve software quality and testing efficiency by constructing predictive classification models from code attributes to enable a timely identification of fault-prone modules. Several classification models have been evaluated for this task. However, due to inconsistent findings regarding the superiority of one classifier over another and the usefulness of metric-based classification in general, more research is needed to improve convergence across studies and further advance confidence in experimental results. We consider three potential sources for bias: comparing classifiers over one or a small number of proprietary datasets, relying on accuracy indicators that are conceptually inappropriate for software defect prediction and cross-study comparisons, and finally, limited use of statisti-cal testing procedures to secure empirical findings. To remedy these problems, a framework for comparative software defect prediction experiments is proposed and applied in a large-scale empirical comparison of 22 classifiers over ten public domain datasets from the NASA Metrics Data repository. Our results indicate that the importance of the particu-lar classification algorithm may have been overestimated in previous research since no significant performance differ-ences could be detected among the top-17 classifiers.
Complexity measures, Data mining, Formal methods, Statistical methods
C. Mues, S. Pietsch, B. Baesens and S. Lessmann, "Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings," in IEEE Transactions on Software Engineering, vol. 34, no. , pp. 485-496, 2008.