The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.09 - Sept. (2013 vol.39)
pp: 1208-1215
Martin Shepperd , Brunel University, Uxbridge
Qinbao Song , Xi'an Jiaotong University, Xi'an
Zhongbin Sun , Xi'an Jiaotong University, Xi'an
Carolyn Mair , Southampton Solent University, Southampton
ABSTRACT
Background--Self-evidently empirical analyses rely upon the quality of their data. Likewise, replications rely upon accurate reporting and using the same rather than similar versions of datasets. In recent years, there has been much interest in using machine learners to classify software modules into defect-prone and not defect-prone categories. The publicly available NASA datasets have been extensively used as part of this research. Objective--This short note investigates the extent to which published analyses based on the NASA defect datasets are meaningful and comparable. Method--We analyze the five studies published in the IEEE Transactions on Software Engineering since 2007 that have utilized these datasets and compare the two versions of the datasets currently in use. Results--We find important differences between the two versions of the datasets, implausible values in one dataset and generally insufficient detail documented on dataset preprocessing. Conclusions--It is recommended that researchers 1) indicate the provenance of the datasets they use, 2) report any preprocessing in sufficient detail to enable meaningful replication, and 3) invest effort in understanding the data prior to applying machine learners.
INDEX TERMS
NASA, Software, PROM, Educational institutions, Sun, Communities, Abstracts, defect prediction, Empirical software engineering, data quality, machine learning
CITATION
Martin Shepperd, Qinbao Song, Zhongbin Sun, Carolyn Mair, "Data Quality: Some Comments on the NASA Software Defect Datasets", IEEE Transactions on Software Engineering, vol.39, no. 9, pp. 1208-1215, Sept. 2013, doi:10.1109/TSE.2013.11
REFERENCES
[1] S. Armstrong, "Significance Tests Harm Progress in Forecasting," Int'l J. Forecasting, vol. 23, no. 2, pp. 321-327, 2007.
[2] J. Bezdek, J. Keller, R. Krishnapuram, L. Kuncheva, and N. Pal, "Will the Real Iris Data Please Stand Up?" IEEE Trans. Fuzzy Systems, vol. 7, no. 3, pp. 368-369, June 1999.
[3] G. Boetticher, Improving Credibility of Machine Learner Models in Software Engineering, pp. 52-72. Idea Group Inc., 2007.
[4] C. Catal and B. Diri, "A Systematic Review of Software Fault Prediction Studies," Expert Systems with Applications, vol. 36, no. 4, pp. 7346-7354, 2009.
[5] W. Fan, F. Geerts, and X. Jia, "A Revival of Integrity Constraints for Data Cleaning," Proc. VLDB Endowment, vol. 1, no. 2, pp. 1522-1523, Aug. 2008.
[6] D. Gray, D. Bowes, N. Davey, Y. Sun, and B. Christianson, "The Misuse of the NASA Metrics Data Program Data Sets for Automated Software Defect Prediction," Proc. 15th Ann. Conf. Evaluation and Assessment in Software Eng., 2011.
[7] T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, "A Systematic Literature Review on Fault Prediction Performance in Software Engineering," IEEE Trans. Software Eng., vol. 38, no. 6, pp. 1276-1304, Nov./Dec. 2012.
[8] D. Ince, L. Hatton, and J. Graham-Cumming, "The Case for Open Computer Programs," Nature, vol. 482, no. 7386, pp. 485-488, 2012.
[9] Y. Jiang, B. Cukic, and T. Menzies, "Fault Prediction Using Early Lifecycle Data," Proc. 18th IEEE Int'l Symp. Software Reliability Eng., pp. 237-246, 2007.
[10] K. Kaminsky and G. Boetticher, "Building a Genetically Engineerable Evolvable Program (GEEP) Using Breadth-Based Explicit Knowledge for Predicting Software Defects," Proc. IEEE Ann. Meeting Fuzzy Information Processing Soc., pp. 10-15, 2004.
[11] J. Keung, E. Kocaguneli, and T. Menzies, "Finding Conclusion Stability for Selecting the Best Effort Predictor in Software Effort Estimation," Automated Software Eng., vol. 20, pp. 543-567, 2013.
[12] S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, "Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings," IEEE Trans. Software Eng., vol. 34, no. 4, pp. 485-496, July/Aug. 2008.
[13] G. Liebchen, "Data Cleaning Techniques for Software Engineering Data Sets," doctoral thesis, Brunel Univ., 2011.
[14] G. Liebchen and M. Shepperd, "Data Sets and Data Quality in Software Engineering," Proc. Int'l Workshop Predictor Models in Software Eng., 2008.
[15] Y. Liu, T. Khoshgoftaar, and N. Seliya, "Evolutionary Optimization of Software Quality Modeling with Multiple Repositories," IEEE Trans. Software Eng., vol. 36, no. 6, pp. 852-864, Nov./Dec. 2010.
[16] T. Menzies, J. Greenwald, and A. Frank, "Data Mining Static Code Attributes to Learn Defect Predictors," IEEE Trans. Software Eng., vol. 33, no. 1, pp. 2-13, Jan. 2007.
[17] T. Menzies and M. Shepperd, "Editorial: Special Issue on Repeatable Results in Software Engineering Prediction," Empirical Software Eng., vol. 17, nos. 1/2, pp. 1-17, 2012.
[18] M. Shepperd, "Data Quality: Cinderella at the Software Metrics Ball?" Proc. Second Int'l Workshop Emerging Trends in Software Metrics, pp. 1-4, 2011.
[19] Q. Song, Z. Jia, M. Shepperd, S. Ying, and J. Liu, "A General Software Defect-Proneness Prediction Framework," IEEE Trans. Software Eng., vol. 37, no. 3, pp. 356-370, May/June 2011.
[20] V. Stodden, "The Scientific Method in Practice: Reproducibility in the Computational Sciences," MIT Sloan School Working Paper 4773-10, 2010. http://ssm.com/abstract=1550193, Aug. 2012.
[21] H. Zhang and X. Zhang, "Comments on "Data Mining Static Code Attributes to Learn Defect Predictors"," IEEE Trans. Software Eng., vol. 33, no. 9, pp. 635-636, Sept. 2007.
25 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool