The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - June (2013 vol.39)
pp: 822-834
Tim Menzies , West Virginia University, Morgantown
Andrew Butcher , West Virginia University, Morgantown
David Cok , GrammaTech, Ithaca
Andrian Marcus , Wayne State University, Detroit
Lucas Layman , Fraunhofer Center, College Park
Forrest Shull , Fraunhofer Center, College Park
Burak Turhan , University of Oulu, Oulu
Thomas Zimmermann , Microsoft Research, Redmond
ABSTRACT
Existing research is unclear on how to generate lessons learned for defect prediction and effort estimation. Should we seek lessons that are global to multiple projects or just local to particular projects? This paper aims to comparatively evaluate local versus global lessons learned for effort estimation and defect prediction. We applied automated clustering tools to effort and defect datasets from the PROMISE repository. Rule learners generated lessons learned from all the data, from local projects, or just from each cluster. The results indicate that the lessons learned after combining small parts of different data sources (i.e., the clusters) were superior to either generalizations formed over all the data or local lessons formed from particular projects. We conclude that when researchers attempt to draw lessons from some historical data source, they should 1) ignore any existing local divisions into multiple sources, 2) cluster across all available data, then 3) restrict the learning of lessons to the clusters from other sources that are nearest to the test data.
INDEX TERMS
Estimation, Data models, Context, Java, Telecommunications, Measurement, Software, effort estimation, Data mining, clustering, defect prediction
CITATION
Tim Menzies, Andrew Butcher, David Cok, Andrian Marcus, Lucas Layman, Forrest Shull, Burak Turhan, Thomas Zimmermann, "Local versus Global Lessons for Defect Prediction and Effort Estimation", IEEE Transactions on Software Engineering, vol.39, no. 6, pp. 822-834, June 2013, doi:10.1109/TSE.2012.83
REFERENCES
[1] C. Jones, Estimating Software Costs, second ed. McGraw-Hill, 2007.
[2] B. Boehm, E. Horowitz, R. Madachy, D. Reifer, B.K. Clark, B. Steece, A.W. Brown, S. Chulani, and C. Abts, Software Cost Estimation with Cocomo II. Prentice Hall, 2000.
[3] B.A. Kitchenham, T. Dyba, and M. Jorgensen, "Evidence-Based Software Engineering," Proc. 26th Int'l Conf. Software Eng., pp. 273-281, 2004.
[4] N. Nagappan, T. Ball, and A. Zeller, "Mining Metrics to Predict Component Failures," Proc. 28th Int'l Conf. Software Eng., pp. 452-461, 2006.
[5] D. Posnett, V. Filkov, and P. Devanbu, "Ecological Inference in Empirical Software Engineering," Proc. IEEE/ACM Int'l Conf. Automated Software Eng., 2011.
[6] N. Bettenburg, M. Nagappan, and A.E. Hassan, "Think Locally, Act Globally: Improving Defect and Effort Prediction Models," Proc. Ninth IEEE Working Conf. Mining Software Repositories, pp. 60-69, 2012.
[7] Y. Yang, L. Xie, Z. He, Q. Li, V. Nguyen, B.W. Boehm, and R. dValerdi, "Local Bias and Its Impacts on the Performance of Parametric Estimation Models," Proc. Seventh Int'l Conf. Predictive Models in Software Eng., 2011.
[8] K. Lum, J. Hihn, and T. Menzies, "Studies in Software Cost Model Behavior: Do We Really Understand Cost Model Performance?" Proc. ISPA Conf., 2006.
[9] T. Menzies, J. Greenwald, and A. Frank, "Data Mining Static Code Attributes to Learn Defect Predictors," IEEE Trans. Software Eng., vol. 33, no. 1, pp. 2-13, Jan. 2007.
[10] T. Menzies, A. Butcher, A. Marcus, T. Zimmermann, and D. Cok, "Local vs Global Models for Effort Estimation and Defect Prediction," Proc. 26th IEEE/ACM Int'l Conf. Automated Software Eng., 2011.
[11] T. Menzies, Z. Chen, D. Port, and J. Hihn, "Simple Software Cost Estimation: Safe or Unsafe?" Proc. Workshop Predictive Models in Software Eng. Workshop, 2005.
[12] B. Boehm, Software Engineering Economics. Prentice Hall, 1981.
[13] H. Olague, L. Etzkorn, S. Gholston, and S. Quattlebaum, "Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes," IEEE Trans. Software Eng., vol. 33, no. 6, pp. 402-419, June 2007.
[14] K. Aggmakarwal, Y. Singh, A. Kaur, and R. Malhotra, "Empirical Analysis for Investigating the Effect of Object-Oriented Metrics on Fault Proneness: A Replicated Case Study," Software Process: Improvement and Practice, vol. 14, no. 1, pp. 39-62, Jan. 2009.
[15] E. Arisholm and L. Briand, "Predicting Fault Prone Components in a JAVA Legacy System," Proc. ACM/IEEE Int'l Symp. Empirical Software Eng., pp. 8-17, 2006.
[16] V. Basili, L. Briand, and W. Melo, "A Validation of Object-Oriented Design Metrics as Quality Indicators," IEEE Trans. Software Eng., vol. 22, no. 10, pp. 751-761, Oct. 1996.
[17] L. Briand, J. Wust, J. Daly, and D.V. Porter, "Exploring the Relationships between Design Measures and Software Quality in Object-Oriented Systems," J. Systems and Software, vol. 51, no. 3, pp. 245-273, 2000.
[18] L. Briand, J. Wust, and H. Lounis, "Replicated Case Studies for Investigating Quality Factors in Object-Oriented Designs," Empirical Software Eng., vol. 6, no. 1, pp. 11-58, 2001.
[19] M. Cartwright and M. Shepperd, "An Empirical Investigation of an Object-Oriented Software System," IEEE Trans. Software Eng., vol. 26, no. 8, pp. 786-796, Aug. 2000.
[20] K.el Emam, W. Melo, and J. Machado, "The Prediction of Faulty Classes Using Object-Oriented Design Metrics," J. Systems and Software, vol. 56, no. 1, pp. 63-75, 2001.
[21] K.el Emam, S. Benlarbi, N. Goel, and S. Rai, "A Validation of Object-Oriented Metrics," Nat'l Research Council of Canada, NRC/ERB, vol. 1063, 1999.
[22] M. Tang, M. Kao, and M. Chen, "An Empirical Study on Object-Oriented Metrics," Proc. Sixth Int'l Software Metrics Symp., pp. 242-249, 1999.
[23] P. Yu, T. Systa, and H. Muller, "Predicting Fault-Proneness Using OO Metrics an Industrial Case Study," Proc. Sixth European Conf. Software Maintenance and Reeng., pp. 99-107, 2002.
[24] R. Subramanyam and M.S. Krishnan, "Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects," IEEE Trans. Software Eng., vol. 29, no. 4, pp. 297-310, Apr. 2003.
[25] Y. Zhou and H. Leung, "Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults," IEEE Trans. Software Eng., vol. 32, no. 10, pp. 771-789, Oct. 2006.
[26] T. Gyimothy, R. Ferenc, and I. Siket, "Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction," IEEE Trans. Software Eng., vol. 31, no. 10, pp. 897-910, Oct. 2005.
[27] T. Holschuh, M. Pauser, K. Herzig, T. Zimmermann, R. Premraj, and A. Zeller, "Predicting Defects in SAP Java Code: An Experience Report," Proc. 31st Int'l Conf. Software Eng.—Companion Volume, pp. 172-181, 2009.
[28] R. Shatnawi and W. Li, "The Effectiveness of Software Metrics in Identifying Error-Prone Classes in Post-Release Software Evolution Process," J. Systems and Software, vol. 81, no. 11, pp. 1868-1882, 2008.
[29] F. Fioravanti and P. Nesi, "A Study on Fault-Proneness Detection of Object-Oriented Systems," Proc. Fifth European Conf. Software Maintenance and Reeng., pp. 121-130, 2001.
[30] M. Thongmak and P. Muenchaisri, "Predicting Faulty Classes Using Design Metrics with Discriminant Analysis," Proc. Int'l Conf. Software Eng. Research and Practice, pp. 621-627, 2003.
[31] G. Denaro, L. Lavazza, and M. Pezze, "An Empirical Evaluation of Object Oriented Metrics in Industrial Setting," Proc. Fifth CaberNet Plenary Workshop, 2003.
[32] A. Janes, M. Scotto, W. Pedrycz, B. Russo, M. Stefanovic, and G. Succi, "Identification of Defect-Prone Classes in Telecommunication Software Systems Using Design Metrics," Information Sciences, vol. 176, no. 24, pp. 3711-3734, 2006.
[33] M. English, C. Exton, I. Rigon, and B. Cleary, "Fault Detection and Prediction in an Open-Source Software Project," Proc. Fifth Int'l Conf. Predictor Models in Software Eng., nos. 1-11, 2009.
[34] R. Shatnawi, "A Quantitative Investigation of the Acceptable Risk Levels of Object-Oriented Metrics in Open-Source Systems," IEEE Trans. Software Eng., vol. 36, no. 2, pp. 216-225, Mar./Apr. 2010.
[35] Y. Singh, A. Kaur, and R. Malhotra, "Empirical Validation of Object-Oriented Metrics for Predicting Fault Proneness Models," Software Quality J., vol. 18, no. 1, pp. 3-35, 2010.
[36] D. Glasberg, K.el Emam, W. Memo, and N. Madhavji, "Validating Object-Oriented Design Metrics on a Commercial JAVA Application," NRC 44146, 2000.
[37] K.el Emam, S. Benlarbi, N. Goel, and S. Rai, "The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics," IEEE Trans. Software Eng., vol. 27, no. 7, pp. 630-650, July 2001.
[38] M. Thapaliyal and G. Verma, "Software Defects and Object Oriented Metrics—An Empirical Analysis," Int'l J. Computer Applications, vol. 9/5, p. 41, 2010.
[39] J. Xu, D. Ho, and L. Capretz, "An Empirical Validation of Object-Oriented Design Metrics for Fault Prediction," J. Computer Science, pp. 571-577, July 2008.
[40] G. Succi, "Practical Assessment of the Models for Identification of Defect-Prone Classes in Object-Oriented Commercial Systems Using Design Metrics," J. Systems and Software, vol. 65, no. 1, pp. 1-12, Jan. 2003.
[41] M. Jorgensen, "A Review of Studies on Expert Estimation of Software Development Effort," J. Systems and Software, vol. 70, no. 1/2, pp. 37-60, 2004.
[42] B. Kitchenham, E. Mendes, and G.H. Travassos, "Cross versus Within-Company Cost Estimation Studies: A Systematic Review," IEEE Trans. Software Eng., vol. 33, no. 5, pp. 316-329, May 2007.
[43] S.G. MacDonell and M.J. Shepperd, "Comparing Local and Global Software Effort Estimation Models—Reflections on a Systematic Review," Proc. First Int'l Symp. Empirical Software Eng. and Measurement, pp. 401-409, 2007.
[44] C. Mair and M. Shepperd, "The Consistency of Empirical Comparisons of Regression and Analogy-Based Software Project Cost Prediction," Proc. Int'l Symp. Empirical Software Eng., p. 10, 2005.
[45] T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy, "Cross-Project Defect Prediction," Proc. Seventh Joint Meeting European Software Eng. Conf. and ACM SIGSOFT Symp. Foundations of Software Eng., Aug. 2009.
[46] L.C. Briand and J. Wuest, "Empirical Studies of Quality Models in Object-Oriented Systems," Advances in Computers, vol. 56, pp. 98-167, 2002.
[47] B. Turhan, "On the Dataset Shift Problem in Software Engineering Prediction Models," Empirical Software Eng., vol. 17, pp. 62-74, 2012.
[48] T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A. Bener, "Defect Prediction from Static Code Features: Current Results, Limitations, New Approaches," Automated Software Eng., vol. 17, no. 4, pp. 375-407, Dec. 2010.
[49] G. Gay, T. Menzies, M. Davies, and K. Gundy-Burlet, "Automatically Finding the Control Variables for Complex System Behavior," Automated Software Eng., vol. 17, no. 4, pp. 439-468, Dec. 2010.
[50] T. Menzies and Y. Hu, "Data Mining for Very Busy People," Computer, vol. 36, no. 11, pp. 22-29, Nov. 2003.
[51] D. Baker, "A Hybrid Approach to Expert and Model-Based Effort Estimation," master's thesis, Lane Dept. of Computer Science and Electrical Eng., West Virginia Univ., https://eidr.wvu.edu/etddocumentdata.eTD?documentid=5443 , 2007.
[52] Q. Du and J.E. Fowler, "Low-Complexity Principal Component Analysis for Hyperspectral Image Compression," Int'l J. High Performance Computing Applications, vol. 22, no. 4, pp. 438-448, Nov. 2008.
[53] C. Faloutsos and K.-I. Lin, "Fastmap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 163-174, 1995.
[54] I.H. Witten and E. Frank, Data Mining, second ed. Morgan Kaufmann, 2005.
[55] D. Rumelhart, G. Hinton, and R. Williams, "Learning Representations by Back-Propagating Errors," Nature, vol. 323, pp. 533-536, 1986.
[56] S. Bouktif, H. Sahraoui, and G. Antoniol, "Simulated Annealing for Improving Software Quality Prediction," Proc. Eighth Ann. Conf. Genetic and Evolutionary Computation, pp. 1893-1900, 2006.
[57] J.R. Quinlan, "Learning with Continuous Classes," Proc. Fifth Australian Joint Conf. Artificial Intelligence, pp. 343-348, 1992.
[58] G. Boetticher, "When Will It Be Done? The 300 Billion Dollar Question, Machine Learner Answers," IEEE Intelligent Systems, vol. 18, no. 3, pp. 48-50, May/June 2003.
[59] E. Kocaguneli, G. Gay, T. Menzies, Y. Yang, and J.W. Keung, "When to Use Data from Other Projects for Effort Estimation," Proc. IEEE/ACM Int'l Conf. Automated Software Eng., 2010.
[60] T. Menzies, B. Caglayan, E. Kocaguneli, J. Krall, F. Peters, and B. Turhan, "The Promise Repository of Empirical Software Engineering Data," http:/promisedata.googlecode.com, June 2012.
[61] R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufman, 1992.
[62] B. Gaines and P. Compton, "Induction of Ripple Down Rules," Proc. Fifth Australian Conf. Artificial Intelligence, pp. 349-354, 1992.
[63] Q. Song, Z. Jia, M. Shepperd, S. Ying, and J. Liu, "A General Software Defect-Proneness Prediction Framework," IEEE Trans. Software Eng., vol. 37, no. 3, pp. 356-370, http://dx.doi.org/10.1109TSE.2010.90, May/June 2011.
[64] M. Jorgensen and M. Shepperd, "A Systematic Review of Software Development Cost Estimation Studies," IEEE Trans. Software Eng., vol. 33, no. 1, pp. 33-53, Jan. 2007.
[65] C. Cai, A. Fu, C. Cheng, and W. Kwong, "Mining Association Rules with Weighted Items," Proc. Int'l Database Eng. and Applications Symp., Aug. 1998.
[66] S. Bay and M. Pazzani, "Detecting Change in Categorical Data: Mining Contrast Sets," Proc. Fifth Int'l Conf. Knowledge Discovery and Data Mining, 1999.
[67] P.K. Novak, N. Lavrač, and G.I. Webb, "Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining," J. Machine Learning Research, vol. 10, pp. 377-403, June 2009.
[68] M. Harman and B. Jones, "Search-Based Software Engineering," J. Information and Software Technology, vol. 43, pp. 833-839, Dec. 2001.
[69] F. Glover and M. Laguna, "Tabu Search," Modern Heuristic Techniques for Combinatorial Problems, C. Reeves, ed., Blackwell Scientific Publishing, 1993.
[70] M. Zlochin, M. Birattari, N. Meuleau, and M. Dorigo, "Model-Based Search for Combinatorial Optimization: A Critical Survey," Annals of Operations Research, vol. 131, pp. 373-395, 2004.
[71] H. Pan, M. Zheng, and X. Han, "Particle Swarm-Simulated Annealing Fusion Algorithm and Its Application in Function Optimization," Proc. Int'l Conf. Computer Science and Software Eng., pp. 78-81, 2008.
28 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool