Issue No. 02 - March-April (2012 vol. 38)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TSE.2011.27
A. Bener , Ted Rogers Sch. of Inf. Technol. Manage., Ryerson Univ., Toronto, ON, Canada
J. W. Keung , Dept. of Comput., Hong Kong Polytech. Univ., Kowloon, China
T. Menzies , Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA
E. Kocaguneli , Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA
Background: There are too many design options for software effort estimators. How can we best explore them all? Aim: We seek aspects on general principles of effort estimation that can guide the design of effort estimators. Method: We identified the essential assumption of analogy-based effort estimation, i.e., the immediate neighbors of a project offer stable conclusions about that project. We test that assumption by generating a binary tree of clusters of effort data and comparing the variance of supertrees versus smaller subtrees. Results: For 10 data sets (from Coc81, Nasa93, Desharnais, Albrecht, ISBSG, and data from Turkish companies), we found: 1) The estimation variance of cluster subtrees is usually larger than that of cluster supertrees; 2) if analogy is restricted to the cluster trees with lower variance, then effort estimates have a significantly lower error (measured using MRE, AR, and Pred(25) with a Wilcoxon test, 95 percent confidence, compared to nearest neighbor methods that use neighborhoods of a fixed size). Conclusion: Estimation by analogy can be significantly improved by a dynamic selection of nearest neighbors, using only the project data from regions with small variance.
trees (mathematics), pattern clustering, program testing, project management, software cost estimation, project data, analogy-based effort estimation, software effort estimator design, essential assumption, supertree variance, subtree variance, Coc81 data set, Nasa93 data set, Desharnais data set, Albrecht data set, ISBSG data set, Turkish companies, estimation variance, binary cluster tree, cluster subtrees, dynamic selection, nearest neighbor selection, Estimation, Training, Software, Training data, Linear regression, Euclidean distance, Humans, k-NN., Software cost estimation, analogy
A. Bener, J. W. Keung, T. Menzies and E. Kocaguneli, "Exploiting the Essential Assumptions of Analogy-Based Effort Estimation," in IEEE Transactions on Software Engineering, vol. 38, no. , pp. 425-438, 2012.