Subscribe

Issue No.02 - March-April (2012 vol.38)

pp: 425-438

A. Bener , Ted Rogers Sch. of Inf. Technol. Manage., Ryerson Univ., Toronto, ON, Canada

J. W. Keung , Dept. of Comput., Hong Kong Polytech. Univ., Kowloon, China

T. Menzies , Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA

E. Kocaguneli , Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TSE.2011.27

ABSTRACT

Background: There are too many design options for software effort estimators. How can we best explore them all? Aim: We seek aspects on general principles of effort estimation that can guide the design of effort estimators. Method: We identified the essential assumption of analogy-based effort estimation, i.e., the immediate neighbors of a project offer stable conclusions about that project. We test that assumption by generating a binary tree of clusters of effort data and comparing the variance of supertrees versus smaller subtrees. Results: For 10 data sets (from Coc81, Nasa93, Desharnais, Albrecht, ISBSG, and data from Turkish companies), we found: 1) The estimation variance of cluster subtrees is usually larger than that of cluster supertrees; 2) if analogy is restricted to the cluster trees with lower variance, then effort estimates have a significantly lower error (measured using MRE, AR, and Pred(25) with a Wilcoxon test, 95 percent confidence, compared to nearest neighbor methods that use neighborhoods of a fixed size). Conclusion: Estimation by analogy can be significantly improved by a dynamic selection of nearest neighbors, using only the project data from regions with small variance.

INDEX TERMS

trees (mathematics), pattern clustering, program testing, project management, software cost estimation, project data, analogy-based effort estimation, software effort estimator design, essential assumption, supertree variance, subtree variance, Coc81 data set, Nasa93 data set, Desharnais data set, Albrecht data set, ISBSG data set, Turkish companies, estimation variance, binary cluster tree, cluster subtrees, dynamic selection, nearest neighbor selection, Estimation, Training, Software, Training data, Linear regression, Euclidean distance, Humans, k-NN., Software cost estimation, analogy

CITATION

A. Bener, J. W. Keung, T. Menzies, E. Kocaguneli, "Exploiting the Essential Assumptions of Analogy-Based Effort Estimation",

*IEEE Transactions on Software Engineering*, vol.38, no. 2, pp. 425-438, March-April 2012, doi:10.1109/TSE.2011.27REFERENCES

- [1] B.W. Boehm,
Software Engineering Economics. Prentice Hall PTR, 1981.- [2] C. Kemerer, "An Empirical Validation of Software Cost Estimation Models,"
Comm. ACM, vol. 30, pp. 416-429, May 1987.- [3] Spareref.com "Nasa to Shut Down Checkout & Launch Control System," http://bit.lyeiYxlf, Aug. 2002.
- [4] B. Boehm, C. Abts, and S. Chulani, "Software Development Cost Estimation Approaches—A Survey,"
Annals of Software Eng., vol. 10, pp. 177-205, 2000.- [5] M. Jorgensen and M. Shepperd, "A Systematic Review of Software Development Cost Estimation Studies,"
IEEE Trans. Software Eng., vol. 33, no. 1, pp. 33-53, Jan. 2007.- [6] M. Shepperd, "Software Project Economics: A Roadmap,"
Proc. Future of Software Eng., pp. 304-315, 2007.- [7] B. Kitchenham, E. Mendes, and G.H. Travassos, "Cross versus Within-Company Cost Estimation Studies: A Systematic Review,"
IEEE Trans. Software Eng., vol. 33, no. 5, pp. 316-329, May 2007.- [8] M. Auer, A. Trendowicz, B. Graser, E. Haunschmid, and S. Biffl, "Optimal Project Feature Weights in Analogy-Based Cost Estimation: Improvement and Limitations,"
IEEE Trans. Software Eng., vol. 32, no. 2, pp. 83-92, Feb. 2006.- [9] T. Menzies, Z. Chen, J. Hihn, and K. Lum, "Selecting Best Practices for Effort Estimation,"
IEEE Trans. Software Eng., vol. 32, no. 11, pp. 883-895, Nov. 2006.- [10] D. Baker, "A Hybrid Approach to Expert and Model-Based Effort Estimation," master's thesis, LCSEE, West Virginia Univ., http://bit.lyhWDEfU, 2007.
- [11] E. Mendes, I.D. Watson, C. Triggs, N. Mosley, and S. Counsell, "A Comparative Study of Cost Estimation Models for Web Hypermedia Applications,"
Empirical Software Eng., vol. 8, no. 2, pp. 163-196, 2003.- [12] Y. Li, M. Xie, and T. Goh, "A Study of Project Selection and Feature Weighting for Analogy Based Software Cost Estimation,"
J. Systems and Software, vol. 82, pp. 241-252, 2009.- [13] J.R. Quinlan, "Boosting First-Order Learning,"
Proc. Seventh Int'l Workshop Algorithmic Learning Theory, pp. 143-155, 1996.- [14] T. Menzies, O. Jalali, J. Hihn, D. Baker, and K. Lum, "Stable Rankings for Different Effort Models,"
Automated Software Eng., vol. 17, no. 4, pp. 409-437, Dec. 2010.- [15] F. Walkerden and R. Jeffery, "An Empirical Study of Analogy-Based Software Effort Estimation,"
Empirical Software Eng., vol. 4, no. 2, pp. 135-158, 1999.- [16] C. Kirsopp, M. Shepperd, and R. Premraj, "Case and Feature Subset Selection in Case-Based Software Project Effort Prediction,"
Proc. Int'l Conf. Knowledge-Based Systems and Applied Artificial Intelligence, 2003.- [17] M. Shepperd and C. Schofield, "Estimating Software Project Effort Using Analogies,"
IEEE Trans. Software Eng., vol. 23, no. 11, pp. 736-743, Nov. 1997.- [18] M. Shepperd, C. Schofield, and B. Kitchenham, "Effort Estimation Using Analogy,"
Proc. Int'l Conf. Software Eng., 1996.- [19] G. Kadoda, M. Cartwright, and M. Shepperd, "On Configuring a Case-Based Reasoning Software Project Prediction System,"
Proc. UK Case Based Reasoning Workshop, pp. 1-10, 2000.- [20] J. Li and G. Ruhe, "Analysis of Attribute Weighting Heuristics for Analogy-Based Software Effort Estimation Method Aqua+,"
Empirical Software Eng., vol. 13, pp. 63-96, Feb. 2008.- [21] J. Li and G. Ruhe, "A Comparative Study of Attribute Weighting Heuristics for Effort Estimation by Analogy,"
Proc. ACM/IEEE Int'l Symp. Empirical Software Eng., pp. 66-74, 2006.- [22] J. Li and G. Ruhe, "Decision Support Analysis for Software Effort Estimation by Analogy,"
Proc. Third Int'l Workshop Predictor Models in Software Eng., p. 6, 2007.- [23] J.W. Keung, "Empirical Evaluation of Analogy-x for Software Cost Estimation,"
Proc. Second ACM-IEEE Int'l Symp. Empirical Software Eng. and Measurement, pp. 294-296, 2008.- [24] J.W. Keung, B.A. Kitchenham, and D.R. Jeffery, "Analogy-x: Providing Statistical Inference to Analogy-Based Software Cost Estimation,"
IEEE Trans. Software Eng., vol. 34, no. 4, pp. 471-484, July/Aug. 2008.- [25] J.W. Keung and B. Kitchenham, "Experiments with Analogy-x for Software Cost Estimation,"
Proc. Australian Conf. Software Eng., pp. 229-238, 2008.- [26] B. Nuseibeh, "To Be and Not to Be: On Managing Inconsistency in Software Development,"
Proc. Eight IEEE Int'l Workshop Software Specification and Design, pp. 164-169, 1996.- [27] B.W. Boehm, C. Abts, A.W. Brown, S. Chulani, B.K. Clark, E. Horowitz, R. Madachy, D.J. Reifer, and B. Steece,
Software Cost Estimation with Cocomo II. Prentice Hall PTR, 2000.- [28] M. Jorgensen, "A Review of Studies on Expert Estimation of Software Development Effort,"
J. Systems and Software, vol. 70, pp. 37-60, 2004.- [29] M. Jorgensen and T. Gruschke, "The Impact of Lessons-Learned Sessions on Effort Estimation and Uncertainty Assessments,"
IEEE Trans. Software Eng., vol. 35, no. 3, pp. 368-383, May/June 2009.- [30] M. Shepperd and G.F. Kadoda, "Comparing Software Prediction Techniques Using Simulation,"
IEEE Trans. Software Eng., vol. 27, no. 11, pp. 1014-1022, Nov. 2001.- [31] H. Park and S. Baek, "An Empirical Validation of a Neural Network Model for Software Effort Estimation,"
Expert Systems with Applications: An Int'l J., vol. 35, no. 3, pp. 929-937, 2008.- [32] A. Venkatachalam, "Software Cost Estimation Using Artificial Neural Networks,"
Proc. Int'l Joint Conf. Neural Networks, pp. 987-990, 1993.- [33] G. Wittig and G. Finnie, "Estimating Software Development Effort with Connectionist Models,"
Information and Software Technology, vol. 39, no. 7, pp. 469-476, 1997.- [34] Y.-C. Ho and A.E. Bryson,
Applied Optimal Control: Optimization, Estimation, and Control. Hemisphere Pub. Corp., 1969.- [35] D.E. Rumelhart and J.L. McClelland,
Parallel Distributed Processing: Explorations in the Microstructure, vol. 2. MIT Press, PDP Research Group 1986.- [36] B. Kitchenham and E. Mendes, "Why Comparative Effort Prediction Studies May Be Invalid,"
Proc. Int'l Conf. Predictor Models in Software Eng., pp. 1-5, 2009.- [37] M. Hall and G. Holmes, "Benchmarking Attribute Selection Techniques for Discrete Class Data Mining,"
IEEE Trans. Knowledge and Data Eng., vol. 15, no. 6, pp. 1437-1447, Nov./Dec. 2003.- [38] P.N. Tan, M. Steinbach, and V. Kumar,
Introduction to Data Mining. Addison Wesley, 2005.- [39] J.C. Gower and P. Legendre, "Metric and Euclidean Properties of Dissimilarity Coefficients,"
J. Classification, vol. 3, pp. 5-48, 1986.- [40] J. Li, G. Ruhe, A. Al-Emran, and M.M. Richter, "A Flexible Method for Software Effort Estimation by Analogy,"
Empirical Software Eng., vol. 12, no. 1, pp. 65-106, 2007.- [41] C. Chang, "Finding Prototypes for Nearest Neighbor Classifiers,"
IEEE Trans. Computers, vol. 23, no. 11, pp. 1179-1185, Nov. 1974.- [42] J. Gama and C. Pinto, "Discretization from Data Streams: Applications to Histograms and Data Mining,"
Proc. ACM Symp. Applied Computing, pp. 662-667, 2006.- [43] U.M. Fayyad and I.H. Irani, "Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning,"
Proc. Int'l Joint Conf. Artificial Intelligence, pp. 1022-1027, 1993.- [44] Y. Yang and G.I. Webb, "A Comparative Study of Discretization Methods for Naive-Bayes Classifiers,"
Proc. Pacific Rim Knowledge Acquisition Workshop, pp. 159-173, 2002.- [45] E. Frank, M. Hall, and B. Pfahringer, "Locally Weighted Naive Bayes,"
Proc. Conf. Uncertainty in Artificial Intelligence, pp. 249-256, 2003.- [46] L. Angelis and I. Stamelos, "A Simulation Tool for Efficient Analogy Based Cost Estimation,"
Empirical Software Eng., vol. 5, pp. 35-68, 2000.- [47] J.R. Quinlan, "Learning with Continuous Classes,"
Proc. Australian Joint Conf. Artificial Intelligence, pp. 343-348, 1992.- [48] Y. Li, M. Xie, and T.N. Goh, "A Study of the Non-Linear Adjustment for Analogy Based Software Cost Estimation,"
Empirical Software Eng., vol. 14, pp. 603-643, 2009.- [49] U. Lipowezky, "Selection of the Optimal Prototype Subset for 1-NN Classification,"
Pattern Recognition Letters, vol. 19, pp. 907-918, 1998.- [50] C. Kirsopp and M. Shepperd, "Making Inferences with Small Numbers of Training Sets,"
IEE Proc.-Software, vol. 149, no. 5, pp. 123-130, Oct. 2002.- [51] I. Myrtveit, E. Stensrud, and M. Shepperd, "Reliability and Validity in Comparative Studies of Software Prediction Models,"
IEEE Trans. Software Eng., vol. 31, no. 5, pp. 380-391, May 2005.- [52] D. Beeferman and A. Berger, "Agglomerative Clustering of a Search Engine Query Log,"
Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 407-416, 2000.- [53] S. Guha, R. Rastogi, and K.S. Cure, "An Efficient Clustering Algorithm for Large Databases,"
Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 73-84, 1998.- [54] M.B. Eisen, P.T. Spellman, P.O. Brown, and D. Botstein, "Cluster Analysis and Display of Genome-Wide Expression Patterns,"
Proc. Nat'l Academy of Sciences, vol. 95, pp. 14863-14868, 1998.- [55] M. Steinbach, G. Karypis, and V. Kumar, "A Comparison of Document Clustering Techniques,"
Proc. KDD Workshop Text Mining, 2010.- [56] E.P. Kasten and P.K. McKinley, "MESO: Supporting Online Decision Making in Autonomic Computing Systems,"
IEEE Trans. Knowledge and Data Eng., vol. 19, no. 4, pp. 485-499, Apr. 2007.- [57] R. Quinlan,
C4.5: Programs for Machine Learning. Morgan Kaufman, 1992.- [58] J. Kliijnen, "Sensitivity Analysis and Related Analyses: A Survey of Statistical Techniques,"
J. Statistical Computation and Simulation, vol. 57, nos. 1-4, pp. 111-142, 1997.- [59] T. Menzies and S. Goss, "Vague Models and Their Implications for the kbs Design Cycle,"
Proc. Pacific Knowledge Acquisition Workshop and Monash Univ. Dept. of Software Development Technical Report TR96-15, 1996.- [60] M. Shepperd, "Personnel Communication on the Value of Different Evaluation Criteria," 2007.
- [61] S. Chulani, B. Boehm, and B. Steece, "Bayesian Analysis of Empirical Software Engineering Cost Models,"
IEEE Trans. Software Eng., vol. 25, no. 4, pp. 573-583, July/Aug. 1999.- [62] G. Boetticher, "When Will It Be Done? Machine Learner Answers to the 300 Billion Dollar Question,"
IEEE Intelligent Systems, vol. 18, no. 3, pp. 48-50, May/June 2003.- [63] T. Foss, E. Stensrud, B. Kitchenham, and I. Myrtveit, "A Simulation Study of the Model Evaluation Criterion Mmre,"
IEEE Trans. Software Eng., vol. 29, no. 11, pp. 985-995, Nov. 2003.- [64] E. Alpaydin,
Introduction to Machine Learning. MIT Press, 2004.- [65] C. Robson,
Real World Research: A Resource for Social Scientists and Practitioner-Researchers. Blackwell Publisher, Ltd, 2002.- [66] Y. Wang, Q. Song, S. MacDonell, M. Shepperd, and J. Shen, "Integrate the GM(1,1) and Verhulst Models to Predict Software Stage-Effort,"
IEEE Trans. Systems, vol. 39, no. 6, pp. 647-658, Nov. 2009.- [67] L.C. Briand, K. El Emam, D. Surmann, I. Wieczorek, and K.D. Maxwell, "An Assessment and Comparison of Common Software Cost Estimation Modeling Techniques,"
Proc. Int'l Conf. Software Eng., pp. 313-322, 1999.- [68] D. Milic and C. Wohlin, "Distribution Patterns of Effort Estimations,"
Proc. Euromicro Conf., 2004.- [69] A. Bakir, B. Turhan, and A. Bener, "A New Perspective on Data Homogeneity in Software Cost Estimation: A Study in the Embedded Systems Domain,"
Software Quality J., vol. 18, no. 1, pp. 57-80, 2009. |