Subscribe

Issue No.02 - March-April (2012 vol.38)

pp: 375-397

Karel Dejaeger , Katholieke Universiteit Leuven, Leuven

Wouter Verbeke , Katholieke Universiteit Leuven, Leuven

David Martens , University of Antwerp, Antwerp

Bart Baesens , Katholieke Universiteit Leuven, Leuven and University of Southampton, Highfield Southampton

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TSE.2011.55

ABSTRACT

A predictive model is required to be accurate and comprehensible in order to inspire confidence in a business setting. Both aspects have been assessed in a software effort estimation setting by previous studies. However, no univocal conclusion as to which technique is the most suited has been reached. This study addresses this issue by reporting on the results of a large scale benchmarking study. Different types of techniques are under consideration, including techniques inducing tree/rule-based models like M5 and CART, linear models such as various types of linear regression, nonlinear models (MARS, multilayered perceptron neural networks, radial basis function networks, and least squares support vector machines), and estimation techniques that do not explicitly induce a model (e.g., a case-based reasoning approach). Furthermore, the aspect of feature subset selection by using a generic backward input selection wrapper is investigated. The results are subjected to rigorous statistical testing and indicate that ordinary least squares regression in combination with a logarithmic transformation performs best. Another key finding is that by selecting a subset of highly predictive attributes such as project size, development, and environment related attributes, typically a significant increase in estimation accuracy can be obtained.

INDEX TERMS

Data mining, software effort estimation, regression.

CITATION

Karel Dejaeger, Wouter Verbeke, David Martens, Bart Baesens, "Data Mining Techniques for Software Effort Estimation: A Comparative Study",

*IEEE Transactions on Software Engineering*, vol.38, no. 2, pp. 375-397, March-April 2012, doi:10.1109/TSE.2011.55REFERENCES

- [1] The Standish Group, "Chaos Report," Technical report, http:/www.standishgroup.com, 2009.
- [2] M. Jørgensen and M. Shepperd, "A Systematic Review of Software Development Cost Estimation Studies,"
IEEE Trans. Software Eng., vol. 33, no. 1, pp. 33-53, Jan. 2007.- [3] M. Jørgensen and S. Wallace, "Improving Project Cost Estimation by Taking into Account Managerial Flexibility,"
European J. Operational Research, vol. 127, pp. 239-251, 2000.- [4] E.A. Nelson,
Management Handbook for the Estimation of Computer Programming Costs. System Developer Corp., 1966.- [5] B. Kitchenham, S. Pfleeger, B. McColl, and S. Eagan, "An Empirical Study of Maintenance and Development Estimation Accuracy,"
The J. Systems and Software, vol. 64, pp. 57-77, 2002.- [6] M. Jørgensen, "A Review of Studies on Expert Estimation of Software Development Effort,"
The J. Systems and Software, vol. 70, pp. 37-60, 2004.- [7] B. Boehm,
Software Engineering Economics. Prentice Hall, 1981.- [8] B. Boehm, R. Madachy, and B. Steece,
Software Cost Estimation with Cocomo II. Prentice Hall, 2000.- [9] L.H. Putnam, "A General Empirical Solution to the Macro Software Sizing and Estimation Problem,"
IEEE Trans. Software Eng., vol. 4, no. 4, pp. 345-361, July 1978.- [10] A.J. Albrecht and J.E. Gaffney, "Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation,"
IEEE Trans. Software Eng., vol. 9, no. 6, pp. 639-648, Nov. 1983.- [11] C.F. Kemerer, "An Empirical Validation of Software Cost Estimation Models,"
Comm. ACM, vol. 30, no. 5, pp. 416-429, 1987.- [12] G. Finnie, G. Wittig, and J.-M. Desharnais, "A Comparison of Software Effort Estimation Techniques: Using Function Points with Neural Networks, Case-Based Reasoning and Regression Models,"
J. Systems and Software, vol. 39, pp. 281-289, 1997.- [13] P. Sentas, L. Angelis, I. Stamelos, and G. Bleris, "Software Productivity and Effort Prediction with Ordinal Regression,"
Information and Software Technology, vol. 47, pp. 17-29, 2005.- [14] L. Briand, K.E. Emam, D. Surmann, and I. Wieczorek, "An Assessment and Comparison of Common Software Cost Estimation Modeling Techniques,"
Proc. 21st Int'l Conf. Software Eng., pp. 313-323, May 1999.- [15] L. Briand, T. Langley, and I. Wieczorek, "A Replicated Assessment and Comparison of Common Software Cost Modeling Techniques,"
Proc. 22nd Int'l Conf. Software Eng., pp. 377-386, June 2000.- [16] J. Li, G. Ruhe, A. Ak-Emran, and M. Richter, "A Flexible Method for Software Effort Estimation by Analogy,"
Empirical Software Eng., vol. 12, pp. 65-107, 2007.- [17] B. Baesens, T.V. Gestel, S. Viaene, M. Stepanova, J. Suykens, and J. Vanthienen, "Benchmarking State-of-the-Art Classification Algorithms for Credit Scoring,"
J. Operational Research Soc., vol. 54, no. 6, pp. 627-635, 2003.- [18] W. Verbeke, D. Martens, C. Mues, and B. Baesens, "Building Comprehensible Customer Churn Prediction Models with Advanced Rule Induction Techniques,"
Expert Systems with Applications, vol. 38, pp. 2354-2364, 2011.- [19] B. Kitchenham and E. Mendes, "Why Comparative Effort Prediction Studies May Be Invalid,"
Proc. Fifth Int'l Conf. Predictor Models in Software Eng., 2009.- [20] M. Shepperd and C. Schofield, "Estimating Software Project Effort Using Analogies,"
IEEE Trans. Software Eng., vol. 23, no. 12, pp. 736-743, Nov. 1997.- [21] I. Myrtveit and E. Stensrud, "A Controlled Experiment to Assess the Benefits of Estimation with Analogy and Regression Models,"
IEEE Trans. Software Eng., vol. 25, no. 4, pp. 510-525, July/Aug. 1999.- [22] B. Littlewood, P. Popov, and L. Strigini, "Modeling Software Design Diversity a Review,"
ACM Computing Surveys, vol. 33, no. 2, pp. 177-208, 2001.- [23] R.M. Dawes, D. Faust, and P.E. Meehl, "Clinical versus Actuarial Judgement,"
Science, vol. 243, no. 4899, pp. 1668-1674, 1989.- [24] M. Jørgensen, "Forecasting of Software Development Work Effort: Evidence on Expert Judgement and Formal Models,"
Int'l J. Forecasting, vol. 23, pp. 449-462, 2007.- [25] T. Mukhopadhyay, S.S. Vicinanza, and M.J. Prietula, "Examining the Feasibility of a Case-Based Reasoning Model for Software Effort Estimation,"
MIS Quarterly, vol. 16, no. 2, pp. 155-171, 1992.- [26] M. Jørgensen and D.I.K. Søberg, "Impact of Experience on Maintenance Skills,"
J. Software Maintenance and Evolution: Research and Practice, vol. 14, no. 2, pp. 123-146, 2002.- [27] F.J. Heemstra and R.J. Kusters, "Function Point Analysis: Evaluation of a Software Cost Estimation Model,"
European J. Information Systems, vol. 1, no. 4, pp. 223-237, 1991.- [28] T. Hastie, R. Tibshirani, and J. Friedman,
The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Springer, 2001.- [29] G. Box and D. Cox, "An Analysis of Transformations,"
J. Royal Statistical Soc., Series B, vol. 26, pp. 211-252, 1964.- [30] R. Sakia, "The Box-Cox Transformation Technique: A Review,"
J. Royal Statistical Soc. Series D, vol. 41, no. 2, pp. 169-178, 1992.- [31] P.W. Holland, "Robust Regression Using Iteratively Reweighted Least Squares,"
Comm. in Statistics: Theory and Methods, vol. 6, no. 9, pp. 813-827, 1977.- [32] C. Moler, "Society for Industrial and Applied Mathematics,"
Numerical Computing with Matlab, Chapter 5, pp. 1-27, Cambridge Univ. Press, 2004.- [33] R. Jeffery, M. Ruhe, and I. Wieczorek, "Using Public Domain Metrics to Estimate Software Development Effort,"
Proc. Seventh Int'l Software Metrics Symp., pp. 16-27, Apr. 2001.- [34] A.E. Hoerl, "Application of Ridge Analysis to Regression Problems,"
Chemical Eng. Progress, vol. 58, no. 5, pp. 54-59, 1962.- [35] P. Rousseeuw, "Least Median of Squares Regression,"
J. Am. Statistical Assoc., vol. 79, no. 388, pp. 871-880, 1984.- [36] F.R. Hampel, "A General Qualitative Definition of Robustness,"
Annals of Math. Statistics, vol. 42, pp. 1887-1896, 1971.- [37] J.H. Friedman, "Multivariate Adaptive Regression Splines,"
Annals of Statistics, vol. 19, no. 1, pp. 1-67, 1991.- [38] T.-S. Lee and I.-F. Chen, "A Two-Stage Hybrid Credit Scoring Model Using Artificial Neural Networks and Multivariate Adaptive Regression Splines,"
Expert Systems with Applications, vol. 28, pp. 743-752, 2005.- [39] J. Elith and J. Leathwick, "Predicting Species Distributions from Museum and Herbarium Records Using Multiresponse Models Fitted with Multivariate Adaptive Regression Splines,"
Diversity and Distributions, vol. 13, no. 3, pp. 265-275, 2007.- [40] J.R. Quinlan,
C4.5: Programs for Machine Learning. Morgan Kaufmann, 2003.- [41] L. Breiman, J.H. Friedman, R.A. Olsen, and C.J. Stone,
Classification and Regression Trees. Wadsworth & Books/Cole Advanced Books & Software, 1984.- [42] B. Kitchenham, "A Procedure for Analyzing Unbalanced Data Sets,"
IEEE Trans. Software Eng., vol. 24, no. 4, pp. 278-301, Apr. 1998.- [43] E. Mendes, I. Watson, C. Triggs, N. Mosley, and S. Counsell, "A Comparative Study of Cost Estimation Models for Web Hypermedia Applications,"
Empirical Software Eng., vol. 8, pp. 163-196, 2003.- [44] J.R. Quinlan, "Learning with Continuous Classes,"
Proc. Fifth Australian Joint Conf. Artificial Intelligence, Adams and Sterling, eds., pp. 343-348, 1992.- [45] Y. Wang and I.H. Wittig, "Induction of Model Trees for Predicting Continuous Classes,"
Proc. Poster Papers Ninth European Conf. Machine Learning, 1997.- [46] C.M. Bishop,
Neural Networks for Pattern Recognition. Oxford Univ. Press, 1995.- [47] J.M. Zurada,
Introduction to Artificial Neural Systems. PWS Publishing Company, 1995.- [48] B.D. Ripley,
Pattern Recognition and Neural Networks. Cambridge Univ. Press, 1996.- [49] C. Burgess and M. Lefley, "Can Genetic Programming Improve Software Effort Estimation? A Comparative Evaluation,"
Information and Software Technology, vol. 43, pp. 863-873, 2001.- [50] M. Lefley and M. Shepperd, "Using Genetic Programming to Improve Software Effort Estimation Based on General Data Sets,"
Proc. Int'l Conf. Genetic and Evolutionary Computation, pp. 2477-2487, 2003.- [51] K. Hornik, M. Stinchcombe, and H. White, "Multilayer Feedforward Networks Are Universal Approximators,"
Neural Networks, vol. 2, no. 5, pp. 359-366, 1989.- [52] M.T. Hagan and M.B. Menhaj, "Training Feedforward Networks with the Marquardt Algorithm,"
IEEE Trans. Neural Networks, vol. 5, no. 6, pp. 989-993, Nov. 1994.- [53] J. Moody and C. Darken, "Fast Learning in Networks of Locally-Tuned Processing Units,"
Neural Computing, vol. 1, pp. 281-294, 1989.- [54] D. Specht, "A General Regression Neural Network,"
IEEE Trans. Neural Networks, vol. 2, no. 6, pp. 568-576, Nov. 1991.- [55] A. Idri, A. Zahi, E. Mendes, and A. Zakrani, "Software Cost Estimation Models Using Radial Basis Function Neural Networks,"
Software Process and Product Measurement, pp. 21-31, Springer-Verlag, 2008.- [56] A. Heiat, "Comparison of Artificial Neural Networks and Regression Models for Estimating Software Development Effort,"
Information and Software Technology, vol. 44, no. 15, pp. 911-922, 2002.- [57] N.-H. Chiu and S.-J. Huang, "The Adjusted Analogy-Based Software Effort Estimation Based on Similarity Distances,"
The J. Systems and Software, vol. 80, pp. 628-640, 2007.- [58] V.N. Vapnik,
Statistical Learning Theory. John Wiley, 1998.- [59] J. Suykens and J. Vandewalle, "Least Squares Support Vecrot Machine Classifiers,"
Neural Processing Letters, vol. 9, no. 3, pp. 293-300, 1999.- [60] T. Van Gestel, J. Suykens, B. Baesens, S. Viaene, J. Vanthienen, G. Dedene, B.D. Moor, and J. Vandewalle, "Benchmarking Least Squares Support Vector Machine Classifiers,"
Machine Learning, vol. 54, pp. 5-32, 2004.- [61] P. Rao,
Nonparametric Functional Estimation. Academic Press, 1983.- [62] V. Kumar, V. Ravi, M. Carr, and R. Kiran, "Software Development Cost Estimation Using Wavelet Neural Networks,"
The J. Systems and Software, vol. 81, pp. 1853-1867, 2008.- [63] C. Mair, M. Shepperd, and M. Jorgensen, "An Analysis of Datasets Used to Train and Validate Cost Prediction Systems,"
ACM SIGSOFT Software Eng. Notes, vol. 4, pp. 1-6, 2005.- [64] M. Auer, A. Trendowicz, B. Graser, E. Haunschmid, and S. Biffl, "Optimal Project Feature Selection Weigths in Analogy-Based Cost Estimation: Improvement and Limitations,"
IEEE Trans. Software Eng., vol. 32, no. 2, pp. 83-92, Feb. 2006.- [65] K. Maxwell, L. Van Wassenhove, and S. Dutta, "Software Development Productivity of European Space, Military, and Industrial Applications,"
IEEE Trans. Software Eng., vol. 22, no. 10, pp. 706-718, Oct. 1996.- [66] K. Strike, K.E. Emam, and N. Madhavji, "Software Cost Estimation with Incomplete Data,"
IEEE Trans. Software Eng., vol. 27, no. 10, pp. 890-908, Oct. 2001.- [67] K. Maxwell, L. Van Wassenhove, and S. Dutta, "Performance Evaluation of General and Company Specific Models in Software Development Effort Estimation,"
Management Science, vol. 45, pp. 787-803, 1999.- [68] R. Jeffery, M. Ruhe, and I. Wieczorek, "A Comparative Study of Two Software Development Cost Modeling Techniques Using Multi-Organizational and Company-Specific Data,"
Information and Software Technology, vol. 42, no. 14, pp. 1009-1016, 2000.- [69] J. Li and G. Ruhe, "A Comparative Study of Attribute Weighting Heuristics for Effort Estimation by Analogy,"
Proc. ACM-IEEE Int'l Symp. Empirical Software Eng., Sept. 2006.- [70] T. Menzies, Z. Chen, J. Hihn, and K. Lum, "Selecting Best Practices for Effort Estimation,"
IEEE Trans. Software Eng., vol. 32, no. 11, pp. 883-895, Nov. 2006.- [71] A. Idri, T.M. Khoshgoftaar, and A. Abran, "Can Neural Networks Be Easily Interpreted in Software Cost Estimation?"
Proc. IEEE Int'l Conf. Fuzzy Systems, pp. 1162-1167, 2002.- [72] Z. Chen, T. Menzies, D. Port, and B. Boehm, "Feature Subset Selection Can Improve Software Cost Estimation Accuracy,"
ACM SIGSOFT Software Eng. Notes, vol. 30, no. 4, pp. 1-6, 2005.- [73] M. Shepperd, C. Schofield, and B. Kitchenham, "Effort Estimation using Analogies,"
Proc. 18th Int'l Conf. Software Eng., 1996.- [74] Y. Li, M. Xie, and T. Goh, "A Study of Mutual Information Based Feature Selection for Case Based Reasoning in Software Cost Estimation,"
Expert Systems with Applications, vol. 36, pp. 5921-5931, 2009.- [75] A. Tosun, B. Turhan, and A.B. Bener, "Feature Weighting Heuristics for Analogy-Based Effort Estimation Models,"
Expert Systems with Applications, vol. 36, pp. 10 325-10 333, 2009.- [76] J.M. Desharnais, "Analyse Statistique de la Productivities des Projets de Developpement en Informatique Apartir de la Techniques des Points de Fonction," PhD dissertation, Univ. du Quebec, 1988.
- [77] K. Maxwell,
Applied Statistics for Software Managers. Prentice-Hall, 2000.- [78] T. Van Gestel, B. Baesens, P. Van Dijcke, J. Garcia, J. Suykens, and J. Vanthienen, "A Process Model to Develop an Internal Rating System: Sovereign Credit Ratings,"
Decision Support Systems, vol. 42, no. 2, pp. 1131-1151, 2006.- [79] C. Kirsopp and M. Shepperd, "Making Inferences with Small Numbers of Training Sets,"
IEE Proc. Software, vol. 149, no. 5, pp. 123-130, Oct. 2002.- [80] R. Kohavi, "A Study on Cross-validation and Bootstrap for Accuracy Estimation and Model Selection,"
Proc. 14th Int'l Joint Conf. Artificial Intelligence, pp. 1137-1145, 1995.- [81] I. Myrtveit, E. Stensrud, and M. Shepperd, "Reliability and Validity in Comparative Studies of Software Prediction Models,"
IEEE Trans. Software Eng., vol. 31, no. 5, pp. 380-391, May 2005.- [82] D. Martens, M. De Backer, R. Haesen, M. Snoeck, J. Vanthienen, and B. Baesens, "Classification with Ant Colony Optimization,"
IEEE Trans. Evolutionary Computing, vol. 11, no. 5, pp. 651-665, Oct. 2007.- [83] S.D. Conte, H.E. Dunsmore, and V.Y. Shen,
Software Engineering Metrics and Models. The Benjamin/Cummings Publishing Company, Inc., 1986.- [84] D. Port and M. Korte, "Comparative Studies of the Model Evaluation Criterions MMRE and PRED in Software Cost Estimation Research,"
Proc. Second ACM-IEEE Int'l Symp. Empirical Software Eng. and Measurement, pp. 51-60, Oct. 2008.- [85] A. Miyazaki, A. Takanou, H. Nozaki, N. Nakagawa, and K. Okada, "Method to Estimate Parameter Values in Software Prediction Models,"
Information and Software Technology, vol. 33, no. 3, pp. 239-243, 1991.- [86] B. Kitchenham, L. Pickard, S. MacDonell, and M. Shepperd, "What Accuracy Statistics Really Measure,"
IEE Proc. Software, vol. 148, no. 3, pp. 81-85, June 2001.- [87] T. Foss, E. Stensrud, B. Kitchenham, and I. Myrtveit, "A Simulation Study of the Model Evaluation Criterion MMRE,"
IEEE Trans. Software Eng., vol. 29, no. 11, pp. 985-995, Nov. 2003.- [88] J. Demsar, "Statistical Comparison of Classifiers over Multiple Data Sets,"
J. Machine Learning Research, vol. 7, pp. 1-30, 2006.- [89] M. Friedman, "A Comparison of Alternative Tests of Significance for the Problem of m Rankings,"
Annals of Math. Statistics, vol. 11, pp. 86-92, 1940.- [90] O.J. Dunn, "Multiple Comparisons among Means,"
J. Am. Statistical Assoc., vol. 56, pp. 52-64, 1961.- [91] F. Wilcoxon, "Individual Comparisons by Ranking Methods,"
Biometrics, vol. 1, pp. 80-83, 1945.- [92] M. Shepperd and G. Kadoda, "Using Simulation to Evaluate Prediction Techniques,"
Proc. Seventh Int'l Software Metrics Symp., pp. 349-359, 2002.- [93] D. Martens, M. De Backer, R. Haesen, B. Baesens, C. Mues, and J. Vanthienen, "Ant Based Approach to the Knowledge Fusion Problem,"
Proc. Fifth Int'l Workshop Ant Colony Optimisation and Swarm Intelligence, M. Dorigo, L. Gambardella, M. Birattari, A. Martinoli, R. Poli, and T. Stützle, eds., pp. 84-95, 2006.- [94] S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, "Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings,"
IEEE Trans. Software Eng., vol. 34, no. 4, pp. 485-496, July/Aug. 2008.- [95] M. Azzeh, D. Neagu, and P. Cowling, "Improving Analogy Software Effort Estimation Using Fuzzy Feature Subset Selection Algorithm,"
Proc. Fourth Int'l Workshop Predictor Models in Software Eng., pp. 71-78, 2008.- [96] P.-N. Tan, M. Steinbach, and V. Kumar,
Introduction to Data Mining. Addison Wesley, 2005.- [97] B. Baesens, C. Mues, D. Martens, and J. Vanthienen, "50 Years of Data Mining and OR: Upcoming Trends and Challenges,"
J. the Operational Research Soc., vol. 60, pp. 16-23, 2009.- [98] H. Peng, F. Long, and C. Ding, "Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226-1238, Aug. 2005.- [99] M. Jørgensen and K. Moløkken-Østvold, "How Large Are Software Cost Overruns? A Review of the 1994 CHAOS Report,"
Information and Software Technology, vol. 48, pp. 297-301, 2006.- [100] S. MacDonell and M. Shepperd, "Combining Techniques to Optimize Effort Predictions in Software Project Management,"
The J. Systems and Software, vol. 66, pp. 91-98, 2003.- [101] E. Altendorf, E. Restificar, and T. Dietterich, "Learning from Sparse Data by Exploiting Monotonicity Constraints,"
Proc. 21st Conf. Uncertainty in Artificial Intelligence, pp. 18-26, 2005.- [102] J. Huysmans, K. Dejaeger, C. Mues, J. Vanthienen, and B. Baesens, "An Empirical Evaluation of the Comprehensibility of Decision Table, Tree and Rule Based Predictive Models,"
Decision Support Systems, vol. 51, pp. 141-154, 2011.- [103] R. Andrews, J. Diederich, and A. Tickle, "Survey and Critique of Techniques for Extracting Rules from Trained Artificial Neural Networks,"
Knowledge-Based Systems, vol. 8, no. 6, pp. 373-389, 1995.- [104] K. Srinivasan and D. Fisher, "Machine Learning Approaches to Estimating Software Development Effort,"
IEEE Trans. Software Eng., vol. 21, no. 2, pp. 126-137, Feb. 1995.- [105] G. Wittig and G. Finnie, "Estimating Software Development Effort with Connectionist Models,"
Information and Software Technology, vol. 39, no. 7, pp. 469-476, 1997.- [106] S.-J. Huang, N.-H. Chiu, and L.-W. Chen, "Integration of the Grey Relational Analysis with Genetic Algorithm for Software Estimation,"
European J. Operational Research, vol. 188, pp. 898-909, 2008.- [107] H. Park and S. Baek, "An Empirical Validation of a Neural Network Model for Software Effort Estimation,"
Expert Systems with Applications, vol. 35, pp. 929-937, 2008.- [108] S. Koch and J. Mitlöhner, "Software Project Effort Estimation with Voting Rules,"
Decision Support Systems, vol. 46, pp. 895-901, 2009. |