Issue No.06 - Nov.-Dec. (2012 vol.38)
pp: 1403-1416
Ekrem Kocaguneli , West Virginia University, Morgantown
Tim Menzies , West Virginia University, Morgantown
Jacky W. Keung , The Hong Kong Polytechnic University, Hong Kong
Background: Despite decades of research, there is no consensus on which software effort estimation methods produce the most accurate models. Aim: Prior work has reported that, given M estimation methods, no single method consistently outperforms all others. Perhaps rather than recommending one estimation method as best, it is wiser to generate estimates from ensembles of multiple estimation methods. Method: Nine learners were combined with 10 preprocessing options to generate 9 \times 10 = 90 solo methods. These were applied to 20 datasets and evaluated using seven error measures. This identified the best n (in our case n=13) solo methods that showed stable performance across multiple datasets and error measures. The top 2, 4, 8, and 13 solo methods were then combined to generate 12 multimethods, which were then compared to the solo methods. Results: 1) The top 10 (out of 12) multimethods significantly outperformed all 90 solo methods. 2) The error rates of the multimethods were significantly less than the solo methods. 3) The ranking of the best multimethod was remarkably stable. Conclusion: While there is no best single effort estimation method, there exist best combinations of such effort estimation methods.
Costs, Software performance, Measurement uncertainty, Taxonomy, Machine learning, Regression tree analysis, Support vector machines, Neural networks, k-NN, Software cost estimation, ensemble, machine learning, regression trees, support vector machines, neural nets, analogy
Ekrem Kocaguneli, Tim Menzies, Jacky W. Keung, "On the Value of Ensemble Effort Estimation", IEEE Transactions on Software Engineering, vol.38, no. 6, pp. 1403-1416, Nov.-Dec. 2012, doi:10.1109/TSE.2011.111
[1] B.W. Boehm, Software Engineering Economics. Prentice Hall, 1981.
[2] A. Albrecht and J. Gaffney, "Software Function, Source Lines of Code and Development Effort Prediction: A Software Science Validation," IEEE Trans. Software Eng., vol. 9, no. 6, pp. 639-648, Nov. 1983.
[3] L.H. Putnam and W. Myers, Measures for Excellence: Reliable Software on Time, within Budget. Yourdon Press, 1992.
[4] B. Boehm, C. Abts, and S. Chulani, "Software Development Cost Estimation Approaches—A Survey," Annals of Software Eng., vol. 10, pp. 177-205, 2000.
[5] M. Shepperd, C. Schofield, and B. Kitchenham, "Effort Estimation Using Analogy," Proc. 18th Int'l Conf. Software Eng., pp. 170-178, 1996.
[6] A. Corazza, S.Di Martino, F. Ferrucci, C. Gravino, F. Sarro, and E. Mendes, "How Effective Is Tabu Search to Configure Support Vector Regression for Effort Estimation?" Proc. Sixth Int'l Conf. Predictive Models in Software Eng., 2010.
[7] T. Menzies, Z. Chen, J. Hihn, and K. Lum, "Selecting Best Practices for Effort Estimation," IEEE Trans. Software Eng., vol. 32, no. 11, pp. 883-895, Nov. 2006.
[8] M. Jorgensen and M. Shepperd, "A Systematic Review of Software Development Cost Estimation Studies," IEEE Trans. Software Eng., vol. 33, no. 1, pp. 33-53, Jan. 2007.
[9] M. Shepperd and M. Cartwright, "Predicting with Sparse Data," IEEE Trans. Software Eng., vol. 27, no. 11, pp. 987-998, Nov. 2001.
[10] M. Jorgensen, "A Review of Studies on Expert Estimation of Software Development Effort," J. Systems and Software, vol. 70, pp. 37-60, 2004.
[11] G. Seni and J. Elder, Ensemble Methods in Data Mining: Improving Accuracy through Combining Predictions. Morgan and Claypool Publishers, 2010.
[12] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, second ed. Springer, 2008.
[13] R. Kohavi, "A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection," Proc. 14th Int'l Joint Conf. Artificial Intelligence, pp. 1137-1143, 1995.
[14] L. Breiman and P. Spector, "Submodel Selection and Evaluation in Regression. The X-Random Case," Int'l Statistical Rev., vol. 60, no. 3, pp. 291-319, Dec. 1992.
[15] E. Kocaguneli, Y. Kultur, and A. Bener, "Combining Multiple Learners Induced on Multiple Data Sets for Software Effort Prediction," Proc. Int'l Symp. Software Reliability Eng., 2009.
[16] D. Baker, "A Hybrid Approach to Expert and Model-Based Effort Estimation," master's thesis, Lane Dept. of Computer Science and Electrical Eng., West Virginia Univ., , 2007.
[17] T.M. Khoshgoftaar, P. Rebours, and N. Seliya, "Software Quality Analysis by Combining Multiple Projects and Learners," Software Quality Control, vol. 17, no. 1, pp. 25-49, 2009.
[18] Y. Freund and R. Schapire, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting," J. Computer and System Sciences, vol. 55, pp. 119-139, 1997.
[19] J.W. Keung, "Theoretical Maximum Prediction Accuracy for Analogy-Based Software Cost Estimation," Proc. 15th Asia-Pacific Software Eng. Conf., wrapper.htm?arnumber=4724583, pp. 495-502, 2008.
[20] I. Myrtveit, E. Stensrud, and M. Shepperd, "Reliability and Validity in Comparative Studies of Software Prediction Models," IEEE Trans. Software Eng., vol. 31, no. 5, pp. 380-391, May 2005.
[21] L. Briand and I. Wieczorek, Resource Modeling in Software Engineering, second ed. Wiley, 2002.
[22] G. Rowe and G. Wright, "The Delphi Technique as a Forecasting Tool: Issues and Analysis," Int'l J. Forecasting, vol. 15, pp. 351-371, 1999.
[23] K. Atkinson and M. Shepperd, "The Use of Function Points to Find Cost Analogies," Proc. Fifth European Software Cost Modelling Meeting, 1994.
[24] M. Jorgensen, "Practical Guidelines for Expert-Judgment-Based Software Effort Estimation," IEEE Software, vol. 22, no. 3, pp. 57-63, May/June 2005.
[25] M. Shepperd and G.F. Kadoda, "Comparing Software Prediction Techniques Using Simulation," IEEE Trans. Software Eng., vol. 27, no. 11, pp. 1014-1022, Nov. 2001.
[26] T. Menzies, O. Jalali, J. Hihn, D. Baker, and K. Lum, "Stable Rankings for Different Effort Models," Automated Software Eng., vol. 17, , pp. 409-437, 2010.
[27] E. Alpaydin, "Techniques for Combining Multiple Learners," Proc. Eng. Intelligent Systems Conf., vol. 2, pp. 6-12, 1998.
[28] T.G. Dietterich, "Ensemble Methods in Machine Learning," Proc. First Int'l Workshop Multiple Classifier Systems, pp. 1-15, 2000.
[29] J. Kittler, M. Hatef, R.P.W. Duin, and J. Matas, "On Combining Classifiers," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226-239, Mar. 1998.
[30] K. Ali, "On the Link Between Error Correlation and Error Reduction in Decision Tree Ensembles," technical report, Dept. of Information and Computer Science, Univ. of California, 1995.
[31] T.K. Ho, J.J. Hull, and S.N. Srihari, "Decision Combination in Multiple Classifier Systems," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 1, pp. 66-75, Jan. 1994.
[32] S. Günter and H. Bunke, "Feature Selection Algorithms for the Generation of Multiple Classifier Systems and their Application to Handwritten Word Recognition," Pattern Recognition Letters, vol. 25, no. 11, pp. 1323-1336, 2004.
[33] T.K. Ho, "Random Decision Forests," Proc. Third Int'l Conf. Document Analysis and Recognition, p. 278, 1995.
[34] H. Zhao and S. Ram, "Constrained Cascade Generalization of Decision Trees," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 6, pp. 727-739, June 2004.
[35] I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementation. Morgan Kaufmann, 2000.
[36] E. Bauer and R. Kohavi, "An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants," Machine Learning, vol. 36, 1007515423169 , pp. 105-139, 1999.
[37] M.C.K. Vinaykumar and V. Ravi, "Software Cost Estimation Using Soft Computing Approaches," Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, pp. 499-518, 2010.
[38] J. Pahariya, V. Ravi, and M. Carr, "Software Cost Estimation Using Computational Intelligence Techniques," Proc. World Congress Nature Biologically Inspired Computing, pp. 849-854, Dec. 2009.
[39] Y. Kultur, B. Turhan, and A.B. Bener, "ENNA: Software Effort Estimation Using Ensemble of Neural Networks with Associative Memory," Proc. 16th ACM SIGSOFT Int'l Symp. Foundations of Software Eng., pp. 330-338, 2008.
[40] B. Twala, M. Cartwright, and M. Shepperd, "Ensemble of Missing Data Techniques to Improve Software Prediction Accuracy," Proc. 28th Int'l Conf. Software Eng., , pp. 909-912, 2006.
[41] T.M. Khoshgoftaar, S. Zhong, and V. Joshi, "Enhancing Software Quality Estimation Using Ensemble-Classifier Based Noise Filtering," Intelligent Data Analysis, vol. 9, http://portal.acm.orgcitation.cfm?id=1239046.1239048 , pp. 3-27, Jan. 2005.
[42] K. Lum, T. Menzies, and D. Baker, "2CEE, A Twenty First Century Effort Estimation Methodology," Proc. ISPA/SCEA Joint Ann. Conf. Training Workshop, pp. 12-14, 2008.
[43] E. Mendes, I.D. Watson, C. Triggs, N. Mosley, and S. Counsell, "A Comparative Study of Cost Estimation Models for Web Hypermedia Applications," Empirical Software Eng., vol. 8, no. 2, pp. 163-196, 2003.
[44] M. Shepperd and C. Schofield, "Estimating Software Project Effort Using Analogies," IEEE Trans. Software Eng., vol. 23, no. 11, pp. 736-743, Nov. 1997.
[45] C. Chang, "Finding Prototypes for Nearest Neighbor Classifiers," IEEE Trans. Computers, vol. 23, no. 11, pp. 1179-1185, Nov. 1974.
[46] A. Venkatachalam, "Software Cost Estimation Using Artificial Neural Networks," Proc. Int'l Joint Conf. Neural Networks, pp. 987-990, 1993.
[47] J. Ghosh, "Multiclassifier Systems: Back to the Future," Proc. Third Int'l Workshop Multiple Classifier Systems, pp. 1-15, 2002.
[48] H. Park and S. Baek, "An Empirical Validation of a Neural Network Model for Software Effort Estimation," Expert Systems with Applications: An Int'l J., vol. 35, no. 3, pp. 929-937, 2008.
[49] K. Pillai and V. Sukumaran Nair, "A Model for Software Development Effort and Cost Estimation," IEEE Trans. Software Eng., vol. 23, no. 8, pp. 485-497, Aug. 1997.
[50] E. Mendes and N. Mosley, "Further Investigation into the Use of CBR and Stepwise Regression to Predict Development Effort for Web Hypermedia Applications," Proc. Int'l Symp. Empirical Software Eng., 2002.
[51] L. Breiman, "Bagging Predictors," Machine Learning, vol. 24, pp. 123-140, 1996.
[52] Y. Jiang, B. Cukic, and T. Menzies, "Cost Curve Evaluation of Fault Prediction Models," Proc. 19th Int'l Symp. Software Reliability Eng., pp. 197-206, 2008.
[53] L. Breiman, "Random Forests," Machine Learning, vol. 45, pp. 5-32, 2001.
[54] T. Foss, E. Stensrud, B. Kitchenham, and I. Myrtveit, "A Simulation Study of the Model Evaluation Criterion MMRE," IEEE Trans. Software Eng., vol. 29, no. 11, pp. 985-995, Nov. 2003.
[55] B. Kitchenham and E. Mendes, "Why Comparative Effort Prediction Studies May be Invalid," Proc. Fifth Int'l Conf. Predictor Models in Software Eng., pp. 1-5, 2009.
[56] J. Kliijnen, "Sensitivity Analysis and Related Analyses: A Survey of Statistical Techniques," J. Statistical Computation and Simulation, vol. 57, nos. 1-4, pp. 111-142, 1997.
[57] J. Demsar, "Statistical Comparisons of Classifiers over Multiple Data Sets," J. Machine Learning Research, vol. 7, pp. 1-30, 2006.
[58] E. Alpaydin, Introduction to Machine Learning. MIT Press, 2004.
[59] A. Oppeneheim, A. Wilsky, and S. Hamid, Signals and Systems. Prentice Hall, 1996.
[60] J. Keung, E. Kocaguneli, and T. Menzies, "A Ranking Stability Indicator for Selecting the Best Effort Estimator in Software Cost Estimation," Automated Software Eng.,, 2011.
[61] B. Kitchenham, E. Mendes, and G.H. Travassos, "Cross Versus Within-Company Cost Estimation Studies: A Systematic Review," IEEE Trans. Software Eng., vol. 33, no. 5, pp. 316-329, May 2007.
[62] D. Milic and C. Wohlin, "Distribution Patterns of Effort Estimations," Proc. 30th EUROMICRO Conf., 2004.
[63] C. Robson, Real World Research: A Resource for Social Scientists and Practitioner-Researchers. Blackwell Publisher, 2002.
[64] E. Kocaguneli and T. Menzies, "Software Effort Models Should Be Assessed via Leave-One-Out Validation," in preparation, http://bit.lybiasVarDraft, 2011.
[65] A. Nelson, T. Menzies, and G. Gay, "Sharing Experiments Using Open-Source Software," Software: Practice and Experience, vol. 41, no. 3, pp. 283-305, 2011.
[66] A. Bakir, E. Kocaguneli, A. Tosun, A. Bener, and B. Turhan, "Xiruxe: An Intelligent Fault Tracking Tool," Proc. Int'l Conf. Artificial Intelligence and Pattern Recognition, 2009.
[67] A. Tosun, A. Bener, and E. Kocaguneli, "BITS: Issue Tracking and Project Management Tool in Healthcare Software Development," Proc. Int'l Conf. Software Eng. and Knowledge Eng., 2009.
[68] E. Kocaguneli, A. Tosun, A. Bener, B. Caglayan, and B. Turhan, "PREST: An Intelligent Software Metrics Extraction, Analysis and Defect Prediction Tool," Proc. Int'l Conf. Software Eng. and Knowledge Eng., 2009.
[69] T. Menzies, C. Bird, T. Zimmermann, W. Schulte, and E. Kocaganeli, "The Inductive Software Eng. Manifesto: Principles for Industrial Data Mining," Proc. Int'l Workshop Machine Learning Technologies in Software Eng., 2011.
[70] J.S. Armstrong, "Significance Tests Harm Progress in Forecasting," Int'l J. Forecasting, vol. 23, no. 2, pp. 321-327, 2007.
[71] J.S. Armstrong, Principles of Forecasting: A Handbook for Researchers and Practitioners. Kluwer Academic, 2001.
[72] R.M. Hogarth, "A Note on Aggregating Opinions," Organizational Behavior and Human Performance, vol. 21, no. 1, pp. 40-46, 1978.
[73] R.M. Hogarth and H. Kunreuther, "Pricing Insurance and Warranties: Ambiguity and Correlated Risks," The GENEVA Papers on Risk and Insurance—Theory, vol. 17, no. 1, pp. 35-60, 1992.
[74] Y. Li, M. Xie, and T. Goh, "A Study of Project Selection and Feature Weighting for Analogy Based Software Cost Estimation," J. Systems and Software, vol. 82, pp. 241-252, 2009.
[75] A. Bakir, B. Turhan, and A. Bener, "A New Perspective on Data Homogeneity in Software Cost Estimation: A Study in the Embedded Systems Domain," Software Quality Control J., vol. 18, pp. 57-80, , 2009.
[76] B. Kitchenham and K. Känsälä, "Inter-Item Correlations among Function Points," Proc. 15th Int'l Conf. Software Eng., http://dl.acm.orgcitation.cfm?id=257572.257677 , pp. 477-480, 1993.
[77] C. Kemerer, "An Empirical Validation of Software Cost Estimation Models," Comm. ACM, vol. 30, no. 5, pp. 416-429, May 1987.
[78] K.D. Maxwell, Applied Statistics for Software Managers. Prentice Hall, 2002.
[79] Y. Miyazaki, M. Terakado, K. Ozaki, and H. Nozaki, "Robust Regression for Developing Software Estimation Models," J. Systems and Software, vol. 27, no. 1, pp. 3-16, 1994.
[80] G. Kadoda, M. Cartwright, and M. Shepperd, "On Configuring a Case-Based Reasoning Software Project Prediction System," Proc. UK CBR Workshop Cambridge, pp. 1-10, 2000.
[81] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. Wadsworth and Brooks, 1984.
[82] K. Hornik, M. Stinchcombe, and H. White, "Multilayer Feedforward Networks Are Universal Approximators," Neural Networks, vol. 2, no. 5, pp. 359-366, 1989.