The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - May/June (2011 vol.37)
pp: 430-447
Dongsun Kim , Sogang University, Seoul
Xinming Wang , The Hong Kong University of Science and Technology, Hong Kong
Sunghun Kim , The Hong Kong University of Science and Technology, Hong Kong
Andreas Zeller , Saarland University, Saarbrücken
S.C. Cheung , The Hong Kong University of Science and Technology, Hong Kong
Sooyong Park , Sogang University, Seoul
ABSTRACT
Many popular software systems automatically report failures back to the vendors, allowing developers to focus on the most pressing problems. However, it takes a certain period of time to assess which failures occur most frequently. In an empirical investigation of the Firefox and Thunderbird crash report databases, we found that only 10 to 20 crashes account for the large majority of crash reports; predicting these “top crashes” thus could dramatically increase software quality. By training a machine learner on the features of top crashes of past releases, we can effectively predict the top crashes well before a new release. This allows for quick resolution of the most important crashes, leading to improved user experience and better allocation of maintenance efforts.
INDEX TERMS
Top crash, machine learning, crash reports, social network analysis, data mining.
CITATION
Dongsun Kim, Xinming Wang, Sunghun Kim, Andreas Zeller, S.C. Cheung, Sooyong Park, "Which Crashes Should I Fix First?: Predicting Top Crashes at an Early Stage to Prioritize Debugging Efforts", IEEE Transactions on Software Engineering, vol.37, no. 3, pp. 430-447, May/June 2011, doi:10.1109/TSE.2011.20
REFERENCES
[1] A Challenge for Exterminators, http://www.nytimes.com/2006/10/09/technology 09vista.html?_r=2&oref=slogin&pagewanted= print , 2006.
[2] E.N. Adams, "Optimizing Preventive Service of Software Products," IBM J. Research and Development, vol. 28, no. 1, pp. 2-14, 1984.
[3] E. Alpaydin, Introduction to Machine Learning. MIT Press, 2004.
[4] A. Avritzer, J.P. Ros, and E.J. Weyuker, "Reliability Testing of Rule-Based Systems," IEEE Software, vol. 13, no. 5, pp. 76-82, Sept. 1996.
[5] T. Ball and J.R. Larus, "Branch Prediction for Free," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 300-313, 1993.
[6] T. Bayes, "An Essay towards Solving a Problem in the Doctrine of Chances," Philosophical Trans. Royal Soc. of London, vol. 53, pp. 370-418, 1763.
[7] N. Bettenburg, S. Just, A. Schröter, C. Weiss, R. Premraj, and T. Zimmermann, "What Makes a Good Bug Report?" Proc. 16th ACM SIGSOFT Int'l Symp. Foundations of Software Eng., pp. 308-318, 2008.
[8] N. Bettenburg, R. Premraj, T. Zimmermann, and S. Kim, "Duplicate Bug Reports Considered Harmful $\ldots$ Really?" Proc. 24th IEEE Int'l Conf. Software Maintenance, pp. 337-345, Sept./Oct. 2008.
[9] BreakPad, http://code.google.com/pgoogle-breakpad/, 2009.
[10] Bugzilla@Mozilla, https:/bugzilla.mozilla.org/, 2009.
[11] R.P.L. Buse and W. Weimer, "The Road Not Taken: Estimating Path Execution Frequency Statically," Proc. IEEE 31st Int'l Conf. Software Eng., pp. 144-154, 2009.
[12] B. Calder, D. Grunwald, M. Jones, D. Lindsay, J. Martin, M. Mozer, and B. Zorn, "Evidence-Based Static Branch Prediction Using Machine Learning," ACM Trans. Software Eng. and Methodology, vol. 19, no. 1, pp. 188-222, 1997.
[13] J. Cho, H. Garcia-Molina, and L. Page, "Efficient Crawling through URL Ordering," Computer Networks and ISDN Systems, vol. 30, nos. 1-7, pp. 161-172, 1998.
[14] CodeViz, www.skynet.ie/mel/projectscodeviz/, 2009.
[15] Connecting with Customers, http://www.microsoft.com/mscorp/execmail/ 200210-02customers.mspx, 2006.
[16] Crash Reporter (Mac OS X), http://developer.apple.com/ technotes/ tn2004tn2123.html, 2009.
[17] B. Demsky and M. Rinard, "Automatic Detection and Repair of Errors in Data Structures," Proc. 18th Ann. ACM SIGPLAN Conf. Object-Oriented Programing, Systems, Languages, and Applications, pp. 78-95, 2003.
[18] J. Ekanayake, J. Tappolet, H.C. Gall, and A. Bernstein, "Tracking Concept Drift of Software Projects Using Defect Prediction Quality," Proc. Sixth IEEE Int'l Working Conf. Mining Software Repositories, pp. 51-60, 2009.
[19] N.E. Fenton and M. Neil, "Software Metrics: Roadmap," Proc. Conf. The Future of Software Eng., pp. 357-370, 2000.
[20] J.A. Fisher and S.M. Freudenberger, "Predicting Conditional Branch Directions from Previous Runs of a Program," Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 85-95, 1992.
[21] GNU Binutils, http://www.gnu.org/softwarebinutils/, 2009.
[22] P. Godefroid et al., "Automated Whitebox Fuzz Testing," Proc. Network Distributed Security Symp., 2008.
[23] K. Goseva-Popstojanova and M. Hamill, "Architecture-Based Software Reliability: Why Only a Few Parameters Matter?" Proc. 31st Ann. Int'l Computer Software and Applications Conf., pp. 423-430, 2007.
[24] R. Gupta, E. Mehofer, and Y. Zhang, Profile-Guided Compiler Optimizations, pp. 143-174. CRC Press, 2002.
[25] M. Hamill and K. Goseva-Popstojanova, "Common Trends in Software Fault and Failure Data," IEEE Trans. Software Eng., vol. 35, no. 4, pp. 484-496, July/Aug. 2009.
[26] R.A. Hanneman and M. Riddle, Introduction to Social Network Methods. Univ. of California, 2005.
[27] J. Hopfield, "Neural Networks and Physical Systems with Emergent Collective Computational Abilities," Proc. Nat'l Academy of Sciences USA, vol. 79, pp. 2554-2558, 1982.
[28] J.A. Jones, J.F. Bowring, and M. Harrold, "Debugging in Parallel," Proc. Int'l Symp. Software Testing and Analysis, pp. 16-26, 2007.
[29] J.A. Jones and M.J. Harrold, "Empirical Evaluation of the Tarantula Automatic Fault-Localization Technique," Proc. 20th IEEE/ACM Int'l Conf. Automated Software Eng., pp. 273-282, 2005.
[30] Jung, http:/jung.sourceforge.net, 2009.
[31] S. Kim, E.J. WhiteheadJr., and Y. Zhang, "Classifying Software Changes: Clean or Buggy?" IEEE Trans. Software Eng., vol. 34, no. 2, pp. 181-196, Mar./Apr. 2008.
[32] S. Kim, T. Zimmermann, E.J. WhiteheadJr., and A. Zeller, "Predicting Faults from Cached History," Proc. 29th Int'l Conf. Software Eng., pp. 489-498, 2007.
[33] S. Kullback, "The Kullback-Leibler Distance," The Am. Statistician, vol. 41, pp. 340-341, 1987.
[34] S. Kullback and R.A. Leibler, "On Information and Sufficiency," The Annals of Math. Statistics, vol. 22, no. 1, pp. 79-86, 1951.
[35] S. Kullback, Information Theory and Statistics. John Wiley and Sons, 1959.
[36] B. Liblit and A. Aiken, "Building a Better Backtrace: Techniques for Postmortem Program Analysis," technical report, Univ. of California, Berkeley, 2002.
[37] R.J.A. Little and D.B. Rubin, Statistical Analysis with Missing Data. Wiley, 2002.
[38] C. Liu and J.W. Han, "Failure Proximity: A Fault Localization-Based Approach," Proc. 14th ACM SIGSOFT Int'l Symp. Foundations of Software Eng., pp. 46-56, 2006.
[39] L. Lopez, J.M. Gonzalez-Barahona, and G. Robles, "Applying Social Network Analysis to the Information in CVS Repositories," Proc. First Int'l Workshop Mining Software Repositories, 2004.
[40] R. Manevich, M. Sridharan, S. Adams, M. Das, and Z. Yang, "PSE: Explaining Program Failures via Postmortem Static Analysis," Proc. 12th ACM SIGSOFT 12th Int'l Symp. Foundations of Software Eng., pp. 63-72, 2004.
[41] T.J. McCabe, "A Complexity Measure," Proc. Second Int'l Conf. Software Eng., p. 407, 1976.
[42] A. Meneely, L. Williams, W. Snipes, and J. Osborne, "Predicting Failures with Developer Networks and Social Network Analysis," Proc. 16th ACM SIGSOFT Int'l Symp. Foundations of Software Eng., pp. 13-23, 2008.
[43] T. Menzies, B. Turhan, A. Bener, G. Gay, B. Cukic, and Y. Jiang, "Implications of Ceiling Effects in Defect Predictors," Proc. Fourth Int'l Workshop Predictor Models in Software Eng., pp. 47-54, 2008.
[44] A. Michail and T. Xie, "Helping Users Avoid Bugs in GUI Applications," Proc. 27th Int'l Conf. Software Eng., pp. 107-116, 2005.
[45] D. Michie, D.J. Spiegelhalter, and C.C. Taylor, Machine Learning, Neural and Statistical Classification. Prentice Hall, 1994.
[46] Microsoft Online Crash Analysis, http://oca.microsoft.com/endcp20.asp, 2009.
[47] A. Miller, Subset Selection in Regression. Chapman & Hall/CRC, 2002.
[48] D. Montgomery, G. Runger, and N. Hubele, Engineering Statistics. Wiley, 2001.
[49] Mozilla Crash Report, http:/crash-stats.mozilla.com/, 2009.
[50] J.C. Munson and S. Elbaum, "Software Reliability as a Function of User Execution Patterns," Proc. 32nd Ann. Hawaii Int'l Conf. System Sciences, vol. 8, p. 8004, 1999.
[51] A. Podgurski, D. Leon, P.A. Francis, W. Masri, M. Minch, J. Sun, and B. Wang, "Automated Support for Classifying Software Failure Reports," Proc. 25th Int'l Conf. Software Eng., pp. 465-475, 2003.
[52] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, "Learning Internal Representation by Error Propagation," Parallel Distributed Processing: Exploration in the Microstructures of Cognition, pp. 318-362, MIT Press, 1986.
[53] S. Shivaji, E.J.W.Jr., R. Akella, and S. Kim, "Reducing Features to Improve Classification-Based Bug Prediction," Proc. 24th IEEE/ACM Int'l Conf. Automated Software Eng., Nov. 2009.
[54] Socorro, http://code.google.com/psocorro/, 2009.
[55] Understand for C++, http://www.scitools.com/products understand /, 2009.
[56] Weka, http://www.cs.waikato.ac.nz/mlweka/, 2009.
[57] M.R. Woodward, M.A. Hennell, and D. Hedley, "A Measure of Control Flow Complexity in Program Text," IEEE Trans. Software Eng., vol. 5, no. 1, pp. 45-50, Jan. 1979.
[58] A. Zeller, "Isolating Cause-Effect Chains from Computer Programs," Proc. 10th ACM SIGSOFT Symp. Foundations of Software Eng., pp. 1-10, 2002.
[59] T. Zimmermann and N. Nagappan, "Predicting Defects Using Network Analysis on Dependency Graphs," Proc. 30th Int'l Conf. Software Eng., pp. 531-540, 2008.
17 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool