This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Reliability and Validity in Comparative Studies of Software Prediction Models
May 2005 (vol. 31 no. 5)
pp. 380-391
Empirical studies on software prediction models do not converge with respect to the question "which prediction model is best?” The reason for this lack of convergence is poorly understood. In this simulation study, we have examined a frequently used research procedure comprising three main ingredients: a single data sample, an accuracy indicator, and cross validation. Typically, these empirical studies compare a machine learning model with a regression model. In our study, we use simulation and compare a machine learning and a regression model. The results suggest that it is the research procedure itself that is unreliable. This lack of reliability may strongly contribute to the lack of convergence. Our findings thus cast some doubt on the conclusions of any study of competing software prediction models that used this research procedure as a basis of model comparison. Thus, we need to develop more reliable research procedures before we can have confidence in the conclusions of comparative studies of software prediction models.

[1] L.C. Briand, V.R. Basili, and W.M. Thomas, “A Pattern Recognition Approach for Software Engineering Data Analysis,” IEEE Trans. Software Eng., vol. 18, no. 11, pp. 931-942, Nov. 1992.
[2] L.C. Briand, V.R. Basili, and C.J. Hetmanski, “Developing Interpretable Models with Optimized Set Reduction for Identifying High-Risk Software Components,” IEEE Trans. Software Eng., vol. 19, no. 11, pp. 1028-1044, Nov. 1993.
[3] L.C. Briand, K. El-Emam, and I. Wieczorek, “A Case Study in Productivity Benchmarking: Methods and Lessons Learned,” Proc. Ninth European Software Control and Metrics Conf. (ESCOM), pp. 4-14, 1998.
[4] L.C. Briand, K. El-Emam, and I. Wieczorek, “Explaining the Cost of European Space and Military Projects,” Proc. 21st Int'l Conf. Software Eng. (ICSE 21), pp. 303-312, 1999.
[5] L.C. Briand, K. El-Emam, K. Maxwell, D. Surmann, and I. Wieczorek, “An Assessment and Comparison of Common Cost Software Project Estimation Methods,” Proc. 21st Int'l Conf. Software Eng. (ICSE 21), pp. 313-322, 1999.
[6] L.C. Briand, T. Langley, and I. Wieczorek, “A Replicated Assessment and Comparison of Common Software Cost Modeling Techniques,” Proc. Int'l Conf. Software Eng. (ICSE 22), pp. 377-386, 2000.
[7] L.C. Briand and I. Wieczorek, “Resource Modeling in Software Engineering,” Encyclopedia of Software Eng., 2001.
[8] E.G. Carmines and R.A. Zeller, Reliability and Validity Assessment. Sage Univ. papers, 1979.
[9] The COCOMO II Suite, http://sunset.usc.edu/research/ cocomosuite index.html, 2004.
[10] S.D. Conte, H.E. Dunsmore, and V.Y. Shen, Software Engineering Metrics and Models. Menlo Park, Calif.: Benjamin/Cummings, 1986.
[11] J.J. Dolado, “On the Problem of the Software Cost Function,” Information Software Technology, vol. 43, no. 1, pp. 61-72, 2001.
[12] B. Efron and G. Gong, “A Leisurely Look at the Bootstrap, the Jackknife, and Cross-Validation,” The Am. Statistician, vol. 37, no. 1, pp. 36-48, Feb. 1983.
[13] Encyclopedia of Statistical Sciences, S. Kotz et al. eds. Wiley, 1982-1998.
[14] T. Foss, I. Myrtveit, and E. Stensrud, “A Comparison of LAD and OLS Regression for Effort Prediction of Software Projects,” Proc. 12th European Software Control and Metrics Conf. (ESCOM 2001), pp. 9-15, 2001.
[15] T. Foss, E. Stensrud, B. Kitchenham, and I. Myrtveit, “A Simulation Study of the Model Evaluation Criterion MMRE,” IEEE Trans. Software Eng., vol. 29, no. 11, pp. 985-995, Nov. 2003.
[16] A.R. Gray and S.G. MacDonell, “Software Metrics Data Analysis — Exploring the Relative Performance of Some Commonly Used Modeling Techniques,” Empirical Software Eng., vol. 4, pp. 297-316, 1999.
[17] R. Jeffery and F. Walkerden, ”Analogy, Regression and Other Methods for Estimating Effort and Software Quality Attributes,” Proc. European Conf. Optimising Software Development and Maintenance (ESCOM '99), pp. 37-46, 1999.
[18] R. Jeffery, M. Ruhe, and I. Wieczorek, “Using Public Domain Metrics to Estimate Software Development Effort,” Proc. METRICS 2001 Conf., pp. 16-27, 2001.
[19] M. Jørgensen, “Experience with the Accuracy of Software Maintenance Task Effort Prediction Models,” IEEE Trans. Software Eng., vol. 21, no. 8, pp. 674-681, Aug. 1995.
[20] B.A. Kitchenham, “A Procedure for Analyzing Unbalanced Datasets,” IEEE Trans. Software Eng., vol. 24, no. 4, pp. 278-301, Apr. 1998.
[21] B.A. Kitchenham and K. Kansala, “Inter-Item Correlations among Function Points,” Proc. First METRICS Conf., pp. 11-14, 1993.
[22] B.A. Kitchenham, S.G. MacDonell, L. Pickard, and M.J. Shepperd, “What Accuracy Statistics Really Measure,” IEE Proc. Software Eng., vol. 148, pp. 81-85, 2001.
[23] B.A. Kitchenham, “The Question of Scale Economies in Software— Why Cannot Researchers Agree?” Information and Software Technology, vol. 44, no. 1, pp. 13-24, 2002.
[24] J. Kuha, “Model Assessment and Model Choice: An Annotated Bibliography,” http://www.stat.psu.edu/jkuha/msbibbiblio. html , 2004.
[25] C. Mair, G. Kadoda, M. Lefley, K. Phalp, C. Schofield, M. Shepperd, and S. Webster, “An Investigation of Machine Learning Based Prediction Systems,” J. Systems Software, vol. 53, pp. 23-29, 2000.
[26] D. Michie, D.J. Spiegelhalter, and C.C. Taylor, Machine Learning, Neural and Statistical Classification. J Campbell, ed. Sussex, U.K.: Ellis Horwood, 1994.
[27] Y. Miyazaki, M. Terakado, K. Ozaki, and H. Nozaki, “Robust Regression for Developing Software Estimation Models,” J. Systems and Software, vol. 27, pp. 3-16, 1994.
[28] T. Mukhopadhyay, S.S. Vicinanza, and M.J. Prietula, “Examining the Feasibility of a Case-Based Reasoning Model for Software Effort Estimation,” MIS Quarterly, pp. 155-171, June 1992.
[29] I. Myrtveit and E. Stensrud, “A Controlled Experiment to Assess the Benefits of Estimating with Analogy and Regression Models,” IEEE Trans. Software Eng., vol. 25, no. 4, pp. 510-525, Apr. 1999.
[30] P. Nesi and T. Querci, “Effort Estimation and Prediction for Object-Oriented Systems,” J. Systems and Software, vol. 42, pp. 89-102, 1998.
[31] J.C. Nunnally and I.H. Bernste, Psychometric Theory, third ed. McGraw-Hill, 1994.
[32] B. Samson, D. Ellison, and P. Dugard, “Software Cost Estimation Using and Albus Perceptron (CMAC),” Information and Software Technology, vol. 39, pp. 55-60, 1997.
[33] M.J. Shepperd and C. Schofield, “Estimating Software Project Effort Using Analogies,” IEEE Trans. Software Eng., vol. 23, no. 12, pp. 736-743, Dec. 1997.
[34] M.J. Shepperd and M. Cartwright, “Predicting with Sparse Data,” IEEE Trans. Software Eng., vol. 27, no. 11, pp. 987-998, Nov. 2001.
[35] R. Srinivasan and D. Fisher, “Machine Learning Approaches to Estimating Software Development Effort,” IEEE Trans. Software Eng., vol. 21, no. 2, pp. 126-137, Feb. 1995.
[36] E. Stensrud, T. Foss, B. Kitchenham, and I. Myrtveit, “A Further Empirical Investigation of the Relationship between MRE and Project Size,” Empirical Software Eng., vol. 8, no. 2, pp. 139-161, 2003.
[37] E. Stensrud and I. Myrtveit, “Identifying High Performance ERP Projects,” IEEE Trans. Software Eng., vol. 29, no. 5, pp. 398-416, May 2003.
[38] E. Stensrud and I. Myrtveit, “Human Performance Estimating with Analogy and Regression Models: An Empirical Validation,” Proc. METRICS'98 Conf., pp. 205-213, 1998.
[39] K. Strike, K. El-Emam, and N. Madhavji, “Software Cost Estimation with Incomplete Data,” IEEE Trans. Software Eng., vol. 27, no. 10, pp. 890-908, Oct. 2001.
[40] S.S. Vicinanza, T. Mukhopadhyay, and M.J. Prietula, “Software Effort Estimation: An Exploratory Study of Expert Performance,” IS Research, vol. 2, no. 4, pp. 243-262, 1991.
[41] F. Walkerden and R. Jeffery, “An Empirical Study of Analogy-Based Software Effort Estimation,” Empirical Software Eng., vol. 4, no. 2, pp. 135-158, 1999.
[42] C. Mair and M.J. Shepperd, ”Making Software Cost Data Available for Meta-Analysis,” Proc. Conf. Empirical Assessment in Software Eng. (EASE 2004), May 2004.
[43] L. Pickard, B. Kitchenham, and S. Linkman, “An Investigation of Analysis Techniques for Software Datasets,” Proc. METRICS 99 Conf., pp. 130-142, 1999.
[44] M.J. Shepperd and G. Kadoda, “Comparing Software Prediction Techniques Using Simulation,” IEEE Trans. Software Eng., vol. 27, no. 11, pp. 1014-1022, Nov. 2001.

Index Terms:
Index Terms- Software metrics, cost estimation, cross-validation, empirical methods, arbitrary function approximators, machine learning, estimation by analogy, regression analysis, simulation, reliability, validity, accuracy indicators.
Citation:
Ingunn Myrtveit, Erik Stensrud, Martin Shepperd, "Reliability and Validity in Comparative Studies of Software Prediction Models," IEEE Transactions on Software Engineering, vol. 31, no. 5, pp. 380-391, May 2005, doi:10.1109/TSE.2005.58
Usage of this product signifies your acceptance of the Terms of Use.