2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC) (2016)
Atlanta, GA, USA
June 10, 2016 to June 14, 2016
Software defect prediction is an important software quality assurance technique. It utilizes historical project data and previously discovered defects to predict potential defects. However, most of existing methods assume that large amounts of labeled historical data are available for prediction, while in the early stage of the life cycle, projects may lack the data needed for building such predictors. In addition, most of existing techniques use static code metrics as predictors, while they omit change information that may introduce risks into software development. In this paper, we take these two issues into consideration, and propose a semi-supervised based defect prediction approach - extRF. extRF extends the classical supervised Random Forest algorithm by self-training paradigm. It also employs change burst information for improving accuracy of software defect prediction. We also conduct an experiment to evaluate extRF against three other supervised machine learners (i.e. Logistic Regression, Naïve Bayes, Random Forest) and compare the effectiveness of code metrics, change burst metrics, and a combination of them. Experimental results show that extRF trained with a small size of labeled dataset achieves comparable performance to some supervised learning approaches trained with a larger size of labeled dataset. When only 2% of Eclipse 2.0 data are used for training, extRF can achieve F-measure about 0:562, approximate to that of LR (a supervised learning approach) at labeled sampling rate of 50%. Besides, change burst metrics outperform code metrics in that F-measure rises to a peak value of 0:75 for Eclipse 3.0 and JDT.Core.
Measurement, Software, Predictive models, Vegetation, Data models, Supervised learning, Training
Q. He, B. Shen and Y. Chen, "Software Defect Prediction Using Semi-Supervised Learning with Change Burst Information," 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), Atlanta, GA, USA, 2016, pp. 113-122.