This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Data Quality: Some Comments on the NASA Software Defect Datasets
Sept. 2013 (vol. 39 no. 9)
pp. 1208-1215
Martin Shepperd, Brunel University, Uxbridge
Qinbao Song, Xi'an Jiaotong University, Xi'an
Zhongbin Sun, Xi'an Jiaotong University, Xi'an
Carolyn Mair, Southampton Solent University, Southampton
Background--Self-evidently empirical analyses rely upon the quality of their data. Likewise, replications rely upon accurate reporting and using the same rather than similar versions of datasets. In recent years, there has been much interest in using machine learners to classify software modules into defect-prone and not defect-prone categories. The publicly available NASA datasets have been extensively used as part of this research. Objective--This short note investigates the extent to which published analyses based on the NASA defect datasets are meaningful and comparable. Method--We analyze the five studies published in the IEEE Transactions on Software Engineering since 2007 that have utilized these datasets and compare the two versions of the datasets currently in use. Results--We find important differences between the two versions of the datasets, implausible values in one dataset and generally insufficient detail documented on dataset preprocessing. Conclusions--It is recommended that researchers 1) indicate the provenance of the datasets they use, 2) report any preprocessing in sufficient detail to enable meaningful replication, and 3) invest effort in understanding the data prior to applying machine learners.
Index Terms:
NASA,Software,PROM,Educational institutions,Sun,Communities,Abstracts,defect prediction,Empirical software engineering,data quality,machine learning
Citation:
Martin Shepperd, Qinbao Song, Zhongbin Sun, Carolyn Mair, "Data Quality: Some Comments on the NASA Software Defect Datasets," IEEE Transactions on Software Engineering, vol. 39, no. 9, pp. 1208-1215, Sept. 2013, doi:10.1109/TSE.2013.11
Usage of this product signifies your acceptance of the Terms of Use.