The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - June (2013 vol.39)
pp: 757-773
Y. Kamei , Grad. Sch., Kyushu Univ., Fukuoka, Japan
E. Shihab , Dept. of Software Eng., Rochester Inst. of Technol., Rochester, NY, USA
B. Adams , Dept. du Genie Inf. et Genie Logiciel, Ecole Polytech. de Montreal, Montreal, QC, Canada
A. E. Hassan , Sch. of Comput., Queen's Univ., Kingston, ON, Canada
A. Mockus , Avaya Labs. Res., Basking Ridge, NY, USA
A. Sinha , Mabel's Labels, Dundas, ON, Canada
N. Ubayashi , Grad. Sch., Kyushu Univ., Fukuoka, Japan
ABSTRACT
Defect prediction models are a well-known technique for identifying defect-prone files or packages such that practitioners can allocate their quality assurance efforts (e.g., testing and code reviews). However, once the critical files or packages have been identified, developers still need to spend considerable time drilling down to the functions or even code snippets that should be reviewed or tested. This makes the approach too time consuming and impractical for large software systems. Instead, we consider defect prediction models that focus on identifying defect-prone (“risky”) software changes instead of files or packages. We refer to this type of quality assurance activity as “Just-In-Time Quality Assurance,” because developers can review and test these risky changes while they are still fresh in their minds (i.e., at check-in time). To build a change risk model, we use a wide range of factors based on the characteristics of a software change, such as the number of added lines, and developer experience. A large-scale study of six open source and five commercial projects from multiple domains shows that our models can predict whether or not a change will lead to a defect with an average accuracy of 68 percent and an average recall of 64 percent. Furthermore, when considering the effort needed to review changes, we find that using only 20 percent of the effort it would take to inspect all changes, we can identify 35 percent of all defect-inducing changes. Our findings indicate that “Just-In-Time Quality Assurance” may provide an effort-reducing way to focus on the most risky changes and thus reduce the costs of developing high-quality software.
INDEX TERMS
Measurement, Quality assurance, Predictive models, Software, Entropy, Object oriented modeling, Accuracy,just-in-time prediction, Maintenance, software metrics, mining software repositories, defect prediction
CITATION
Y. Kamei, E. Shihab, B. Adams, A. E. Hassan, A. Mockus, A. Sinha, N. Ubayashi, "A large-scale empirical study of just-in-time quality assurance", IEEE Transactions on Software Engineering, vol.39, no. 6, pp. 757-773, June 2013, doi:10.1109/TSE.2012.70
REFERENCES
[1] E. Arisholm and L.C. Briand, "Predicting Fault-Prone Components in a Java Legacy System," Proc. ACM/IEEE Int'l Symp. Empirical Software Eng., pp. 8-17, 2006.
[2] E. Arisholm, L.C. Briand, and E.B. Johannessen, "A Systematic and Comprehensive Investigation of Methods to Build and Evaluate Fault Prediction Models," The J. Systems and Software, vol. 83, no. 1, pp. 2-17, 2010.
[3] L. Aversano, L. Cerulo, and C. Del Grosso, "Learning from Bug-Introducing Changes to Prevent Fault Prone Code," Proc. Int'l Workshop Principles of Software Evolution, pp. 19-26, 2007.
[4] V.R. Basili, L.C. Briand, and W.L. Melo, "A Validation of Object-Oriented Design Metrics as Quality Indicators," IEEE Trans. Software Eng., vol. 22, no. 10, pp. 751-761, Oct. 1996.
[5] C. Bird, N. Nagappan, B. Murphy, H. Gall, and P. Devanbu, "Don't Touch My Code!: Examining the Effects of Ownership on Software Quality," Proc. European Software Eng. Conf. and Symp. the Foundations of Software Eng., pp. 4-14, 2011.
[6] L.C. Briand, V.R. Basili, and C.J. Hetmanski, "Developing Interpretable Models with Optimized Set Reduction for Identifying High-Risk Software Components," IEEE Trans. Software Eng., vol. 19, no. 11, pp. 1028-1044, Nov. 1993.
[7] L.C. Briand, J. Wüst, S.V. Ikonomovski, and H. Lounis, "Investigating Quality Factors in Object-Oriented Designs: An Industrial Case Study," Proc. Int'l Conf. Software Eng., pp. 345-354, 1999.
[8] M. Cataldo, A. Mockus, J.A. Roberts, and J.D. Herbsleb, "Software Dependencies, Work Dependencies, and Their Impact on Failures," IEEE Trans. Software Eng., vol. 35, no. 6, pp. 864-878, Nov./Dec. 2009.
[9] S.R. Chidamber and C.F. Kemerer, "A Metrics Suite for Object Oriented Design," IEEE Trans. Software Eng., vol. 20, no. 6, pp. 476-493, June 1994.
[10] M. D'Ambros, M. Lanza, and R. Robbes, "An Extensive Comparison of Bug Prediction Approaches," Proc. Int'l Working Conf. Mining Software Repositories, pp. 31-41, 2010.
[11] B. Efron, "Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation," J. Am. Statistical Assoc., vol. 78, no. 382, pp. 316-331, 1983.
[12] K.E. Emam, W. Melo, and J.C. Machado, "The Prediction of Faulty Classes Using Object-Oriented Design Metrics," J. Systems Software, vol. 56, pp. 63-75, Feb. 2001.
[13] J. Eyolfson, L. Tan, and P. Lam, "Do Time of Day and Developer Experience Affect Commit Bugginess," Proc. Eighth Working Conf. Mining Software Repositories, pp. 153-162, 2011.
[14] T. Fritz, J. Ou, G.C. Murphy, and E. Murphy-Hill, "A Degree-of-Knowledge Model to Capture Source Code Familiarity," Proc. 32nd ACM/IEEE Int'l Conf. Software Eng., 2010.
[15] T.L. Graves, A.F. Karr, J.S. Marron, and H. Siy, "Predicting Fault Incidence Using Software Change History," IEEE Trans. Software Eng., vol. 26, no. 7, pp. 653-661, July 2000.
[16] P.J. Guo, T. Zimmermann, N. Nagappan, and B. Murphy, "Characterizing and Predicting Which Bugs Get Fixed: An Empirical Study of Microsoft Windows," Proc. Int'l Conf. Software Eng., pp. 495-504, 2010.
[17] T. Gyimothy, R. Ferenc, and I. Siket, "Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction," IEEE Trans. Software Eng., vol. 31, no. 10, pp. 897-910, Oct. 2005.
[18] T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, "A Systematic Review of Fault Prediction Performance in Software Engineering," IEEE Trans. Software Eng., vol. 38, no. 6, pp. 1276-1304, Nov./Dec. 2012.
[19] A.E. Hassan, "Predicting Faults Using the Complexity of Code Changes," Proc. Int'l Conf. Software Eng., pp. 16-24, 2009.
[20] I. Herraiz, D.M. German, J.M. Gonzalez-Barahona, and G. Robles, "Towards a Simplification of the Bug Report form in Eclipse," Proc. Int'l Working Conf. Mining Software Repositories, pp. 145-148, 2008.
[21] I. Herraiz, J.M. Gonzalez-Barahona, and G. Robles, "Towards a Theoretical Model for Software Growth," Proc. Fourth Int'l Workshop Mining Software Repositories, p. 21, 2007.
[22] Y. Jiang, B. Cuki, T. Menzies, and N. Bartlow, "Comparing Design and Code Metrics for Software Quality Prediction," Proc. Fourth Int'l Workshop Predictor Models in Software Eng., pp. 11-18, 2008.
[23] Y. Kamei, S. Matsumoto, A. Monden, K. Matsumoto, B. Adams, and A.E. Hassan, "Revisiting Common Bug Prediction Findings Using Effort Aware Models," Proc. Int'l Conf. Software Maintenance, pp. 1-10, 2010.
[24] Y. Kamei, A. Monden, S. Matsumoto, T. Kakimoto, and K. Matsumoto, "The Effects of Over and Under Sampling on Fault-Prone Module Detection," Proc. Int'l Symp. Empirical Software Eng. and Measurement, pp. 196-204, 2007.
[25] T.M. Khoshgoftaar, X. Yuan, and E.B. Allen, "Balancing Misclassification Rates in Classification-Tree Models of Software Quality," Empirical Software Eng., vol. 5, no. 4, pp. 313-330, 2000.
[26] S. Kim, E.J. WhiteheadJr., and Y. Zhang, "Classifying Software Changes: Clean or Buggy?" IEEE Trans. Software Eng., vol. 34, no. 2, pp. 181-196, Mar. 2008.
[27] A.G. Koru, D. Zhang, K.E. Emam, and H. Liu, "An Investigation into the Functional Form of the Size-Defect Relationship for Software Modules," IEEE Trans. Software Eng., vol. 35, no. 2, pp. 293-304, Mar./Apr. 2009.
[28] S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, "Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings," IEEE Trans. Software Eng., vol. 34, no. 4, pp. 485-496, 2008.
[29] M. Leszak, D.E. Perry, and D. Stoll, "Classification and Evaluation of Defects in a Project Retrospective," J. Systems Software, vol. 61, no. 3, pp. 173-187, 2002.
[30] P.L. Li, J. Herbsleb, M. Shaw, and B. Robinson, "Experiences and Results from Initiating Field Defect Prediction and Product Test Prioritization Efforts at ABB Inc.," Proc. Int'l Conf. Software Eng., pp. 413-422, 2006.
[31] C.L. Mallows, "Some Comments on CP," Technometrics, vol. 42, no. 1, pp. 87-94, 2000.
[32] S. Matsumoto, Y. Kamei, A. Monden, and K. Matsumoto, "An Analysis of Developer Metrics for Fault Prediction," Proc. Int'l Conf. Predictive Models in Software Eng., pp. 18:1-18:9, 2010.
[33] T.J. McCabe, "A Complexity Measure," Proc. Second Int'l Conf. Software Eng., p. 407, 1976.
[34] T. Mende and R. Koschke, "Revisiting the Evaluation of Defect Prediction Models," Proc. Int'l Conf. Predictor Models in Software Eng., pp. 1-10, 2009.
[35] T. Mende and R. Koschke, "Effort-Aware Defect Prediction Models," Proc. European Conf. Software Maintenance and Reeng., pp. 109-118, 2010.
[36] T. Menzies, A. Dekhtyar, J. Distefano, and J. Greenwald, "Problems with Precision: A Response to "Comments on 'Data Mining Static Code Attributes to Learn Defect Predictors'"," IEEE Trans. Software Eng., vol. 33, no. 9, pp. 637-640, Sept. 2007.
[37] T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A. Bener, "Defect Prediction from Static Code Features: Current Results, Limitations, New Approaches," Automated Software Eng., vol. 17, no. 4, pp. 375-407, 2010.
[38] A. Mockus, "Organizational Volatility and Its Effects on Software Defects," Proc. Int'l Symp. Foundations of Software Eng., pp. 117-126, 2010.
[39] A. Mockus, R.T. Fielding, and J.D. Herbsleb, "Two Case Studies of Open Source Software Development: Apache and Mozilla," ACM Trans. Software Eng. Methodology, vol. 11, no. 3, pp. 309-346, 2002.
[40] A. Mockus and J. Herbsleb, "Expertise Browser: A Quantitative Approach to Identifying Expertise," Proc. 24th Int'l Conf. Software Eng., 2002.
[41] A. Mockus and D.M. Weiss, "Predicting Risk of Software Changes," Bell Labs Technical J., vol. 5, no. 2, pp. 169-180, 2000.
[42] A. Mockus, P. Zhang, and P.L. Li, "Predictors of Customer Perceived Software Quality," Proc. Int'l Conf. Software Eng., pp. 225-233, 2005.
[43] R. Moser, W. Pedrycz, and G. Succi, "A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction," Proc. Int'l Conf. Software Eng., pp. 181-190, 2008.
[44] J.C. Munson and S.G. Elbaum, "Code Churn: A Measure for Estimating the Impact of Code Change," Proc. Int'l Conf. Software Maintenance, p. 24, 1998.
[45] J.C. Munson and T.M. Khoshgoftaar, "The Detection of Fault-Prone Programs," IEEE Trans. Software Eng., vol. 18, no. 5, pp. 423-433, May 1992.
[46] N. Nagappan and T. Ball, "Use of Relative Code Churn Measures to Predict System Defect Density," Proc. Int'l Conf. Software Eng., pp. 284-292, 2005.
[47] N. Nagappan, T. Ball, and A. Zeller, "Mining Metrics to Predict Component Failures," Proc. Int'l Conf. Software Eng., pp. 452-461, 2006.
[48] N. Nagappan, A. Zeller, T. Zimmermann, K. Herzig, and B. Murphy, "Change Bursts as Defect Predictors," Proc. Int'l Symp. Software Reliability Eng., pp. 309-318, 2010.
[49] N. Ohlsson and H. Alberg, "Predicting Fault-Prone Software Modules in Telephone Switches," IEEE Trans. Software Eng., vol. 22, no. 12, pp. 886-894, Dec. 1996.
[50] T.J. Ostrand, E.J. Weyuker, and R.M. Bell, "Predicting the Location and Number of Faults in Large Software Systems," IEEE Trans. Software Eng., vol. 31, no. 4, pp. 340-355, Apr. 2005.
[51] T.J. Ostrand, E.J. Weyuker, and R.M. Bell, "Programmer-Based Fault Prediction," Proc. Int'l Conf. Predictor Models in Software Eng., pp. 19:1-19:10, 2010.
[52] R. Purushothaman and D.E. Perry, "Toward Understanding the Rhetoric of Small Source Code Changes," IEEE Trans. Software Eng., vol. 31, no. 6, pp. 511-526, June 2005.
[53] J. Ratzinger, T. Sigmund, and H.C. Gall, "On the Relation of Refactorings and Software Defect Prediction," Proc. Int'l Working Conf. Mining Software Repositories, pp. 35-38, 2008.
[54] E.S. Raymond, The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary. Oreilly & Assoc. Inc, 2001.
[55] E. Shihab, Z.M. Jiang, W.M. Ibrahim, B. Adams, and A.E. Hassan, "Understanding the Impact of Code and Process Metrics on Post-Release Defects: A Case Study on the Eclipse Project," Proc. Int'l Symp. Empirical Softw. Eng. and Measurement, pp. 4:1-4:10, 2010.
[56] E. Shihab, A. Mockus, Y. Kamei, B. Adams, and A.E. Hassan, "High-Impact Defects: A Study of Breakage and Surprise Defects," Proc. European Software Eng. Conf. and Symp. Foundations of Software Eng., pp. 300-310, 2011.
[57] J. Śliwerski, T. Zimmermann, and A. Zeller, "When Do Changes Induce Fixes?" Proc. Int'l Conf. Mining Software Repositories, pp. 1-5, 2005.
[58] A. Vanya, R. Premraj, and H.v. Vliet, "Approximating Change Sets at Philips Healthcare: A Case Study," Proc. European Conf. Software Maintenance and Reeng., pp. 121-130, 2011.
[59] R. Wu, H. Zhang, S. Kim, and S.-C. Cheung, "Relink: Recovering Links between Bugs and Changes," Proc. European Software Eng. Conf. and Symp. the Foundations of Software Eng., pp. 15-25, 2011.
[60] Z. Yin, D. Yuan, Y. Zhou, S. Pasupathy, and L. Bairavasundaram, "How Do Fixes Become Bugs?" Proc. 19th ACM SIGSOFT Symp. and the 13th European Conf. Foundations of Software Eng., pp. 26-36, 2011.
[61] M. Zhou and A. Mockus, "Developer Fluency: Achieving True Mastery in Software Projects," Proc. Int'l Symp. Foundations of Software Eng., pp. 137-146, 2010.
[62] T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy, "Cross-Project Defect Prediction: A Large Scale Experiment on Data vs. Domain vs. Process," Proc. European Software Eng. Conf. and Symp. the Foundations of Software Eng., pp. 91-100, 2009.
[63] T. Zimmermann and P. Weisgerber, "Preprocessing CVS Data for Fine-Grained Analysis," Proc. Int'l Workshop Mining Software Repositories, pp. 2-6, May 2004.
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool