The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.11 - Nov. (2013 vol.39)
pp: 1597-1610
Dongsun Kim , The Hong Kong University of Science and Technology, Hong Kong
Yida Tao , The Hong Kong University of Science and Technology, Hong Kong
Sunghun Kim , The Hong Kong University of Science and Technology, Hong Kong
Andreas Zeller , Saarland University, Saarland
ABSTRACT
To support developers in debugging and locating bugs, we propose a two-phase prediction model that uses bug reports' contents to suggest the files likely to be fixed. In the first phase, our model checks whether the given bug report contains sufficient information for prediction. If so, the model proceeds to predict files to be fixed, based on the content of the bug report. In other words, our two-phase model "speaks up" only if it is confident of making a suggestion for the given bug report; otherwise, it remains silent. In the evaluation on the Mozilla "Firefox" and "Core" packages, the two-phase model was able to make predictions for almost half of all bug reports; on average, 70 percent of these predictions pointed to the correct files. In addition, we compared the two-phase model with three other prediction models: the Usual Suspects, the one-phase model, and BugScout. The two-phase model manifests the best prediction performance.
INDEX TERMS
patch file prediction, Bug reports, machine learning,
CITATION
Dongsun Kim, Yida Tao, Sunghun Kim, Andreas Zeller, "Where Should We Fix This Bug? A Two-Phase Recommendation Model", IEEE Transactions on Software Engineering, vol.39, no. 11, pp. 1597-1610, Nov. 2013, doi:10.1109/TSE.2013.24
REFERENCES
[1] A.T. Nguyen, T.T. Nguyen, J. Al-Kofahi, H.V. Nguyen, and T. Nguyen, "A Topic-Based Approach for Narrowing the Search Space of Buggy Files from a Bug Report," Proc. IEEE/ACM 26th Int'l Conf. Automated Software Eng., pp. 263-272, Nov. 2011.
[2] J. Zhou, H. Zhang, and D. Lo, "Where Should the Bugs Be Fixed?—More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports," Proc. 34th Int'l Conf. Software Eng., pp. 14-24, June 2012.
[3] C. Liu, X. Yan, L. Fei, J. Han, and S.P. Midkiff, "SOBER: Statistical Model-Based Bug Localization," Proc. 10th European Software Eng. Conf. Held Jointly with 13th ACM SIGSOFT Int'l Symp. Foundations of Software Eng., pp. 286-295, 2005.
[4] B. Liblit, M. Naik, A.X. Zheng, A. Aiken, and M.I. Jordan, "Scalable Statistical Bug Isolation," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 15-26, 2005.
[5] B. Liblit, A. Aiken, A.X. Zheng, and M.I. Jordan, "Bug Isolation via Remote Program Sampling," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 141-154, 2003.
[6] J.A. Jones, M.J. Harrold, and J. Stasko, "Visualization of Test Information to Assist Fault Localization," Proc. 24th Int'l Conf. Software Eng., pp. 467-477, 2002.
[7] H. Cleve and A. Zeller, "Locating Causes of Program Failures," Proc. 27th Int'l Conf. Software Eng., pp. 342-351, 2005.
[8] M. Burger and A. Zeller, "Minimizing Reproduction of Software Failures," Proc. Int'l Symp. Software Testing and Analysis, pp. 221-231, 2011.
[9] Y. Brun and M.D. Ernst, "Finding Latent Code Errors via Machine Learning over Program Executions," Proc. 26th Int'l Conf. Software Eng., pp. 480-490, 2004.
[10] X. Ren, B.G. Ryder, M. Stoerzer, and F. Tip, "Chianti: A Change Impact Analysis Tool for Java Programs," Proc. 27th Int'l Conf. Software Eng., pp. 664-665, 2005.
[11] O.C. Chesley, X. Ren, B.G. Ryder, and F. Tip, "Crisp—A Fault Localization Tool for Java Programs," Proc. 29th Int'l Conf. Software Eng., pp. 775-779, May 2007.
[12] M. Stoerzer, B.G. Ryder, X. Ren, and F. Tip, "Finding Failure-Inducing Changes in Java Programs Using Change Classification," Proc. 14th ACM SIGSOFT Int'l Symp. Foundations of Software Eng., pp. 57-68, 2006.
[13] M. Weiser, "Program Slicing," Proc. Fifth Int'l Conf. Software Eng., pp. 439-449, 1981.
[14] M. Weiser, "Programmers Use Slices When Debugging," Comm. ACM, vol. 25, no. 7, pp. 446-452, July 1982.
[15] B. Breech, M. Tegtmeyer, and L. Pollock, "Integrating Influence Mechanisms into Impact Analysis for Increased Precision," Proc. 22nd IEEE Int'l Conf. Software Maintenance, pp. 55-65, 2006.
[16] M. Acharya and B. Robinson, "Practical Change Impact Analysis Based on Static Program Slicing for Industrial Software Systems," Proc. 33rd Int'l Conf. Software Eng., pp. 746-755, 2011.
[17] R. Manevich, M. Sridharan, S. Adams, M. Das, and Z. Yang, "PSE: Explaining Program Failures via Postmortem Static Analysis," Proc. 12th ACM SIGSOFT Int'l Symp. Foundations Software Eng., pp. 63-72, 2004.
[18] N. Ohlsson and H. Alberg, "Predicting Fault-Prone Software Modules in Telephone Switches," IEEE Trans. Software Eng., vol. 22, no. 12, pp. 886-894, Dec. 1996.
[19] S. Kim, J.E. James Whitehead, and Y. Zhang, "Classifying Software Changes: Clean or Buggy?" IEEE Trans. Software Eng., vol. 34, no. 2, pp. 181-196, Mar./Apr. 2008.
[20] R. Moser, W. Pedrycz, and G. Succi, "A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction," Proc. 30th Int'l Conf. Software Eng., pp. 181-190, 2008.
[21] A.E. Hassan, "Predicting Faults Using the Complexity of Code Changes," Proc. 31st Int'l Conf. Software Eng., pp. 78-88, 2009.
[22] M. D'Ambros, M. Lanza, and R. Robbes, "An Extensive Comparison of Bug Prediction Approaches," Proc. IEEE Seventh Working Conf. Mining Software Repositories, pp. 31-41, May 2010.
[23] T. Lee, J. Nam, D. Han, S. Kim, and H.P. In, "Micro Interaction Metrics for Defect Prediction," Proc. 19th ACM SIGSOFT Symp. and 13th European Conf. Foundations of Software Eng., pp. 311-321, 2011.
[24] A.T.T. Ying, G.C. Murphy, R. Ng, and M.C. Chu-Carroll, "Predicting Source Code Changes by Mining Change History," IEEE Trans. Software Eng., vol. 30, no. 9, pp. 574-586, Sept. 2004.
[25] T. Zimmermann, P. Weisgerber, S. Diehl, and A. Zeller, "Mining Version Histories to Guide Software Changes," Proc. 26th Int'l Conf. Software Eng., pp. 563-572, 2004.
[26] A. Marcus, A. Sergeyev, V. Rajlich, and J.I. Maletic, "An Information Retrieval Approach to Concept Location in Source Code," Proc. 11th Working Conf. Reverse Eng., pp. 214-223, Nov. 2004.
[27] D. Poshyvanyk, A. Marcus, V. Rajlich, Y.-G. Gueheneuc, and G. Antoniol, "Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification," Proc. 14th IEEE Int'l Conf. Program Comprehension, pp. 137-148, 2006.
[28] D. Poshyvanyk, Y.-G. Gueheneuc, A. Marcus, G. Antoniol, and V. Rajlich, "Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval," IEEE Trans. Software Eng., vol. 33, no. 6, pp. 420-432, June 2007.
[29] S.K. Lukins, N.A. Kraft, and L.H. Etzkorn, "Bug Localization Using Latent Dirichlet Allocation," Information and Software Technology, vol. 52, no. 9, pp. 972-990, Sept. 2010.
[30] G. Gay, S. Haiduc, A. Marcus, and T. Menzies, "On the Use of Relevance Feedback in IR-Based Concept Location," Proc. 25th IEEE Int'l Conf. Software Maintenance, pp. 351-360, Sept. 2009.
[31] B. Ashok, J. Joy, H. Liang, S.K. Rajamani, G. Srinivasa, and V. Vangala, "DebugAdvisor: A Recommender System for Debugging," Proc. Seventh Joint Meeting European Software Eng. Conf. and ACM SIGSOFT Symp. Foundations of Software Eng., pp. 373-382, 2009.
[32] D. Shepherd, Z.P. Fry, E. Hill, L. Pollock, and K. Vijay-Shanker, "Using Natural Language Program Analysis to Locate and Understand Action-Oriented Concerns," Proc. Sixth Int'l Conf. Aspect-Oriented Software Development, pp. 212-224, 2007.
[33] S. Rao and A. Kak, "Retrieval from Software Libraries for Bug Localization: A Comparative Study of Generic and Composite Text Models," Proc. Eighth Working Conf. Mining Software Repositories, pp. 43-52, 2011.
[34] T.J. Biggerstaff, B.G. Mitbander, and D. Webster, "The Concept Assignment Problem in Program Understanding," Proc. 15th Int'l Conf. Software Eng., pp. 482-498, 1993.
[35] C. McMillan, M. Grechanik, D. Poshyvanyk, Q. Xie, and C. Fu, "Portfolio: Finding Relevant Functions and Their Usage," Proc. 33rd Int'l Conf. Software Eng. pp. 111-120, 2011.
[36] G.A. Liebchen and M. Shepperd, "Data Sets and Data Quality in Software Engineering," Proc. Fourth Int'l Workshop Predictor Models Software Eng., pp. 39-44, 2008.
[37] R. Balzer, "Tolerating Inconsistency," Proc. 13th Int'l Conf. Software Eng., pp. 158-165, May 1991.
[38] E. Kocaguneli, T. Menzies, A. Bener, and J. Keung, "Exploiting the Essential Assumptions of Analogy-Based Effort Estimation," IEEE Trans. Software Eng., vol. 38, no. 2, pp. 425-438, Mar./Apr. 2012.
[39] T.M. Khoshgoftaar and N. Seliya, "The Necessity of Assuring Quality in Software Measurement Data," Proc. 10th Int'l Symp. Software Metrics, pp. 119-130, 2004.
[40] J. Aranda and G. Venolia, "The Secret Life of Bugs: Going Past the Errors and Omissions in Software Repositories," Proc. 31st Int'l Conf. Software Eng., pp. 298-308, 2009.
[41] A. Mockus, "Missing Data in Software Engineering," Guide to Advanced Empirical Software Engineering, F. Shull, J. Singer, and D.I.K. Sjberg, eds., pp. 185-200, Springer, 2008.
[42] G. Liebchen, B. Twala, M. Shepperd, M. Cartwright, and M. Stephens, "Filtering, Robust Filtering, Polishing: Techniques for Addressing Quality in Software Data," Proc. First Int'l Symp. Empirical Software Eng. and Measurement, pp. 99-106, 2007.
[43] S. Kim, H. Zhang, R. Wu, and L. Gong, "Dealing with Noise in Defect Prediction," Proc. 33rd Int'l Conf. Software Eng., pp. 481-490, 2011.
[44] C. Bird, A. Bachmann, E. Aune, J. Duffy, A. Bernstein, V. Filkov, and P. Devanbu, "Fair and Balanced? Bias in Bug-Fix Data Sets," Proc. Seventh Joint Meeting European Software Eng. Conf. and ACM SIGSOFT Symp. Foundations Software Eng., pp. 121-130, 2009.
[45] R. Wu, H. Zhang, S. Kim, and S.-C. Cheung, "Relink: Recovering Links between Bugs and Changes," Proc. 19th ACM SIGSOFT Symp. and 13th European Conf. Foundations of Software Eng., pp. 15-25, 2011.
[46] D.D. Lewis, "Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval," Proc. 10th European Conf. Machine Learning, C. Nédellec and C. Rouveirol, eds., pp. 4-15, 1998.
[47] E. Alpaydin, Introduction to Machine Learning. MIT Press, 2004.
[48] J.D.M. Rennie, "Improving Multi-Class Text Classification with Naive Bayes," master's thesis, Massachusetts Inst. of Tech nology, 2001.
[49] P. Hooimeijer and W. Weimer, "Modeling Bug Report Quality," Proc. IEEE/ACM 22nd Int'l Conf. Automated Software Eng., pp. 34-43, 2007.
[50] T. Zimmermann, R. Premraj, N. Bettenburg, S. Just, A. Schröter, and C. Weiss, "What Makes a Good Bug Report?" IEEE Trans. Software Eng., vol. 36, no. 5, pp. 618-643, Sept./Oct. 2010.
[51] A. Bachmann, C. Bird, F. Rahman, P. Devanbu, and A. Bernstein, "The Missing Links: Bugs and Bug-Fix Commits," Proc. 16th ACM SIGSOFT Symp. Foundations of Software Eng., pp. 97-106, 2010.
[52] S. Kim, T. Zimmermann, E.J. WhiteheadJr., and A. Zeller, "Predicting Faults from Cached History," Proc. 29th Int'l Conf. Software Eng., pp. 489-498, 2007.
[53] T.M. Khoshgoftaar, E.B. Allen, N. Goel, A. Nandi, and J. McMullan, "Detection of Software Modules with High Debug Code Churn in a Very Large Legacy System," Proc. Seventh Int'l Symp. Software Reliability Eng., pp. 364-371, 1996.
[54] A. Hassan and R. Holt, "The Top Ten List: Dynamic Fault Prediction," Proc. 21st IEEE Int'l Conf. Software Maintenance, pp. 263-272, 2005.
[55] G. Jeong, S. Kim, and T. Zimmermann, "Improving Bug Triage with Bug Tossing Graphs," Proc. Seventh Joint Meeting European Software Eng. Conf. and ACM SIGSOFT Symp. Foundations of Software Eng., pp. 111-120, 2009.
[56] H.B. Mann, "On a Test of Whether One of Two Random Variables Is Stochastically Larger than the Other," The Annals of Math. Statistics, vol. 18, no. 1, pp. 50-60, Mar. 1947.
[57] D.C. Montgomery and G.C. Runger, Applied Statistics and Probability for Engineers. John Wiley & Sons, 1994.
[58] O.J. Dunn, "Multiple Comparisons among Means," J. Am. Statistical Assoc., vol. 56, no. 293, pp. 52-64, Mar. 1961.
[59] A. Schröter, N. Bettenburg, and R. Premraj, "Do Stack Traces Help Developers Fix Bugs?" Proc. Seventh Working Conf. Mining Software Repositories, pp. 118-121, 2010.
[60] H. Seo and S. Kim, "Predicting Recurring Crash Stacks," Proc. 27th IEEE/ACM Int'l Conf. Automated Software Eng., pp. 180-89, 2012.
[61] G. Widmer and M. Kubat, "Learning in the Presence of Concept Drift and Hidden Contexts," Machine Learning, vol. 23, pp. 69-101, 1996.
[62] L. Swartz, "Why People Hate the Paperclip: Labels, Appearance, Behavior and Social Responses to User Interface Agents," master's thesis, Stanford Univ., 2003.
[63] A. Miller, Subset Selection in Regression, second ed. Chapman and Hall/CRC, Apr. 2002.
[64] S. Shivaji, E. WhiteheadJr., R. Akella, and S. Kim, "Reducing Features to Improve Code Change Based Bug Prediction," IEEE Trans. Software Eng., vol. 39, no. 4, pp. 552-569, Apr. 2013.
[65] J. Anvik, L. Hiew, and G.C. Murphy, "Who Should Fix This Bug?" Proc. 28th Int'l Conf. Software Eng., pp. 361-370, 2006.
[66] P.J. Guo, T. Zimmermann, N. Nagappan, and B. Murphy, "'Not My Bug!' and Other Reasons for Software Bug Report Reassignments," Proc. ACM Conf. Computer Supported Cooperative Work, pp. 395-404, 2011.
67 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool