The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.08 - Aug. (2013 vol.39)
pp: 1040-1053
E. Kocaguneli , Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA
T. Menzies , Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA
J. Keung , Dept. of Comput. Sci., City Univ. of Hong Kong, Hong Kong, China
D. Cok , GrammaTech, Inc., Ithaca, NY, USA
R. Madachy , Naval Postgrad. Sch., Monterey, CA, USA
ABSTRACT
Background: Do we always need complex methods for software effort estimation (SEE)? Aim: To characterize the essential content of SEE data, i.e., the least number of features and instances required to capture the information within SEE data. If the essential content is very small, then 1) the contained information must be very brief and 2) the value added of complex learning schemes must be minimal. Method: Our QUICK method computes the euclidean distance between rows (instances) and columns (features) of SEE data, then prunes synonyms (similar features) and outliers (distant instances), then assesses the reduced data by comparing predictions from 1) a simple learner using the reduced data and 2) a state-of-the-art learner (CART) using all data. Performance is measured using hold-out experiments and expressed in terms of mean and median MRE, MAR, PRED(25), MBRE, MIBRE, or MMER. Results: For 18 datasets, QUICK pruned 69 to 96 percent of the training data (median = 89 percent). K = 1 nearest neighbor predictions (in the reduced data) performed as well as CART's predictions (using all data). Conclusion: The essential content of some SEE datasets is very small. Complex estimation methods may be overelaborate for such datasets and can be simplified. We offer QUICK as an example of such a simpler SEE method.
INDEX TERMS
Estimation, Indexes, Labeling, Frequency selective surfaces, Euclidean distance, Complexity theory, Principal component analysis,k-NN, Software cost estimation, active learning, analogy
CITATION
E. Kocaguneli, T. Menzies, J. Keung, D. Cok, R. Madachy, "Active learning and effort estimation: Finding the essential content of software effort estimation data", IEEE Transactions on Software Engineering, vol.39, no. 8, pp. 1040-1053, Aug. 2013, doi:10.1109/TSE.2012.88
REFERENCES
[1] M. Jorgensen, "A Review of Studies on Expert Estimation of Software Development Effort," J. Systems and Software, vol. 70, pp. 37-60, Feb. 2004.
[2] M. Jorgensen and T. Gruschke, "The Impact of Lessons-Learned Sessions on Effort Estimation and Uncertainty Assessments," IEEE Trans. Software Eng., vol. 35, pp. 368-383, May/June 2009.
[3] B. Boehm, E. Horowitz, R. Madachy, D. Reifer, B.K. Clark, B. Steece, A.W. Brown, S. Chulani, and C. Abts, Software Cost Estimation with Cocomo II. Prentice Hall, 2000.
[4] T. Menzies, O. Jalali, J. Hihn, D. Baker, and K. Lum, "Stable Rankings for Different Effort Models," Automated Software Eng., vol. 17, pp. 409-437, Dec. 2010.
[5] M. Jorgensen and M. Shepperd, "A Systematic Review of Software Development Cost Estimation Studies," IEEE Trans. Software Eng., vol. 33, no. 1, pp. 33-53, Jan. 2007.
[6] J.W. Keung, "Theoretical Maximum Prediction Accuracy for Analogy-Based Software Cost Estimation," Proc. 15th Asia-Pacific Software Eng. Conf., pp. 495-502, 2008.
[7] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. Wadsworth and Brooks, 1984.
[8] A. Corazza, S. Di Martino, F. Ferrucci, C. Gravino, F. Sarro, and E. Mendes, "How Effective Is Tabu Search to Configure Support Vector Regression for Effort Estimation?" Proc. Sixth Int'l Conf. Predictive Models in Software Eng., p. 4, 2010.
[9] Y. Li, M. Xie, and T. Goh, "A Study of Project Selection and Feature Weighting for Analogy Based Software Cost Estimation," J. Systems and Software, vol. 82, no. 2, pp. 241-252, 2009.
[10] T. Menzies, Z. Chen, J. Hihn, and K. Lum, "Selecting Best Practices for Effort Estimation," IEEE Trans. Software Eng., vol. 32, no. 11, pp. 883-895, Nov. 2006.
[11] E. Kocaguneli, T. Menzies, and J. Keung, "On the Value of Ensemble Effort Estimation," IEEE Trans. Software Eng., vol. 38, no. 6, pp. 1403-1416, Nov./Dec. 2012.
[12] M. Shepperd, "It Doesn't Matter What You Do but Does Matter Who Does It!" Proc. CREST Open Workshop, Oct. 2011.
[13] S. Dasgupta, "Analysis of a Greedy Active Learning Strategy," Proc. Neural Information Processing Systems, vol. 17, 2005.
[14] M.-F. Balcan, A. Beygelzimer, and J. Langford, "Agnostic Active Learning," Proc. 23rd Int'l Conf. Machine Learning, pp. 65-72, 2006.
[15] B. Wallace, K. Small, C. Brodley, and T. Trikalinos, "Active Learning for Biomedical Citation Screening," Proc. 16th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 173-182, 2010.
[16] J.F. Bowring, J.M. Rehg, and M.J. Harrold, "Active Learning for Automatic Classification of Software Behavior," ACM SIGSOFT Software Eng. Notes, vol. 29, pp. 195-205, July 2004.
[17] T. Xie and D. Notkin, "Mutually Enhancing Test Generation and Specification Inference," Proc. Int'l Workshop Formal Approaches to Software Testing, pp. 1100-1101, 2004.
[18] A. Hassan and T. Xie, "Software Intelligence: The Future of Mining Software Engineering Data," Proc. FSE/SDP Workshop Future of Software Eng. Research, pp. 161-166, 2010.
[19] C.L. Chang, "Finding Prototypes for Nearest Neighbor Classifiers," IEEE Trans. Computers, vol. 23, no. 11, pp. 1179-1185, Nov. 1974.
[20] E. Kocaguneli, T. Menzies, A. Bener, and J.W. Keung, "Exploiting the Essential Assumptions of Analogy-Based Effort Estimation," IEEE Trans. Software Eng., vol. 38, no. 2, pp. 425-438, Mar./Apr. 2012.
[21] K. Lum, T. Menzies, and D. Baker, "2CEE, A Twenty First Century Effort Estimation Methodology," Proc. Joint Conf. Int'l Soc. Parametric Analysts and Soc. Cost Estimating and Analysis, pp. 12-14, 2008.
[22] L.C. Briand, K. El Emam, D. Surmann, I. Wieczorek, and K.D. Maxwell, "An Assessment and Comparison of Common Software Cost Estimation Modeling Techniques," Proc. 21st Int'l Conf. Software Eng., pp. 313-322, 1999.
[23] A. Bakr, B. Turhan, and A. Bener, "A Comparative Study for Estimating Software Development Effort Intervals," Software Quality J., vol. 19, pp. 537-552, 2011.
[24] E. Alpaydin, Introduction to Machine Learning. MIT Press, 2004.
[25] B.W. Boehm, Software Engineering Economics. Prentice-Hall, 1981.
[26] A. Bakir, E. Kocaguneli, A. Tosun, A. Bener, and B. Turhan, "Xiruxe: An Intelligent Fault Tracking Tool," Proc. Int'l Conf. Artificial Intelligence and Pattern Recognition, 2009.
[27] A. Albrecht and J. Gaffney, "Software Function, Source Lines of Code and Development Effort Prediction: A Software Science Validation," IEEE Trans. Software Eng., vol. 9, no. 6, pp. 639-648, Nov. 1983.
[28] B. Kitchenham and K. Känsälä, "Inter-Item Correlations among Function Points," Proc. 15th Int'l Conf. Software Eng., pp. 477-480, 1993.
[29] C.F. Kemerer, "An Empirical Validation of Software Cost Estimation Models," Comm. ACM, vol. 30, pp. 416-429, May 1987.
[30] K.D. Maxwell, Applied Statistics for Software Managers. Prentice-Hall, 2002.
[31] Y. Miyazaki, M. Terakado, K. Ozaki, and H. Nozaki, "Robust Regression for Developing Software Estimation Models," J. Systems Software, vol. 27, no. 1, pp. 3-16, 1994.
[32] F. Walkerden and R. Jeffery, "An Empirical Study of Analogy-Based Software Effort Estimation," Empirical Software Eng., vol. 4, no. 2, pp. 135-158, 1999.
[33] K. Dejaeger, W. Verbeke, D. Martens, and B. Baesens, "Data Mining Techniques for Software Effort Estimation: A Comparative Study," IEEE Trans. Software Eng., vol. 38, no. 2, pp. 375-397, Mar./Apr. 2012.
[34] M. Shepperd and G. Kadoda, "Comparing Software Prediction Techniques Using Simulation," IEEE Trans. Software Eng., vol. 27, no. 11, pp. 1014-1022, Nov. 2001.
[35] E. Mendes, I. Watson, C. Triggs, N. Mosley, and S. Counsell, "A Comparative Study of Cost Estimation Models for Web Hypermedia Applications," Empirical Software Eng., vol. 8, no. 2, pp. 163-196, 2003.
[36] G. Kadoda, M. Cartwright, and M. Shepperd, "On Configuring a Case-Based Reasoning Software Project Prediction System," Proc. UK CBR Workshop, pp. 1-10, 2000.
[37] J. Keung, E. Kocaguneli, and T. Menzies, "Finding Conclusion Stability for Selecting the Best Effort Predictor in Software Effort Estimation," Automated Software Eng., pp. 1-25, 2012.
[38] M. Shepperd and C. Schofield, "Estimating Software Project Effort Using Analogies," IEEE Trans. Software Eng., vol. 23, no. 11, pp. 736-743, Nov. 1997.
[39] T. Foss, E. Stensrud, B. Kitchenham, and I. Myrtveit, "A Simulation Study of the Model Evaluation Criterion MMRE," IEEE Trans. Software Eng., vol. 29, no. 11, pp. 985-995, Nov. 2003.
[40] J.P.C. Kleijnen, "Sensitivity Analysis and Related Analyses: A Survey of Statistical Techniques," J. Statistical Computation and Simulation, vol. 57, nos. 1-4, pp. 111-142, 1997.
[41] E. Mendes, "Cost Estimation of Web Applications through Knowledge Elicitation," Proc. Int'l Conf. Enterprise Information Systems, pp. 315-329, 2012.
[42] E. Kocaguneli, A. Misirli, B. Caglayan, and A. Bener, "Experiences on Developer Participation and Effort Estimation," Proc. 37th EUROMICRO Conf. Software Eng. and Advanced Applications, pp. 419-422, 2011.
[43] T. Menzies, C. Bird, T. Zimmermann, W. Schulte, and E. Kocaguneli, "The Inductive Software Engineering Manifesto: Principles for Industrial Data Mining," Proc. Int'l Workshop Machine Learning Technologies in Software Eng., 2011.
[44] Y. Ma, G. Luo, X. Zeng, and A. Chen, "Transfer Learning for Cross-Company Software Defect Prediction," Information and Software Technology, vol. 54, no. 3, pp. 248-256, 2012.
[45] S.J. Pan and Q. Yang, "A Survey on Transfer Learning," IEEE Trans. Knowledge and Data Eng., vol. 22, no. 10, pp. 1345-1359, Oct. 2010.
[46] A. Arnold, R. Nallapati, and W. Cohen, "A Comparative Study of Methods for Transductive Transfer Learning," Proc. Seventh IEEE Int'l Conf. Data Mining Workshops, pp. 77-82, 2007.
[47] C. Lokan and E. Mendes, "Using Chronological Splitting to Compare Cross- and Single-Company Effort Models: Further Investigation," Proc. 32nd Australasian Conf. Computer Science, pp. 47-54, 2009.
[48] C. Lokan and E. Mendes, "Applying Moving Windows to Software Effort Estimation," Proc. Third Int'l Symp. Empirical Software Eng. and Measurement, pp. 111-122, 2009.
[49] N. Mittas and L. Angelis, "Ranking and Clustering Software Cost Estimation Models through a Multiple Comparisons Algorithm," IEEE Trans. Software Eng., vol. 39, no. 4, p. 537-551, Apr. 2013.
[50] M. Shepperd and S. MacDonell, "Evaluating Prediction Systems in Software Project Estimation," Information and Software Technology, vol. 54, pp. 820-827, 2012.
64 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool