This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Systematic Study of Failure Proximity
November/December 2008 (vol. 34 no. 6)
pp. 826-843
Chao Liu, Microsoft Corporation, Redmond
Xiangyu Zhang, Purdue University, West Lafayette
Jiawei Han, University of Illinois at Urbana-Champaign, Urbana
Software end-users are the best testers, who keep revealing bugs in software that has undergone rigorous in-house testing. In order to leverage their testing efforts, failure reporting components have been widely deployed in released software. Many utilities of the collected failure data depend on an effective failure indexing technique, which, at the optimal case, would index all failures due to the same bug together. Unfortunately, the problem of failure proximity, which underpins the effectiveness of an indexing technique, has not been systematically studied. This article presents the first systematic study of failure proximity. A failure proximity consists of two components: a fingerprinting function that extracts signatures from failures, and a distance function that calculates the likelihood of two failures being due to the same bug. By considering different instantiations of the two functions, we study an array of six failure proximities (two of them are new) in this article. These proximities range from the simplest approach that checks failure points to the most sophisticated approach that utilizes fault localization algorithms to extract failure signatures. Besides presenting technical details of each proximity, we also study the properties of each proximity and tradeoffs between proximities. These altogether deliver a systematic view of failure proximity.

[1] Description of the Dr. Watson for Windows Tool, http://support.microsoft.com/kb308538, 2008.
[2] Mozilla Quality Feedback Agent, http://www.mozilla.org/qualityqfa.html, 2008.
[3] IT Collaborate, “BugSentry,” http:/bugsentry.com, 2007.
[4] S. Bekker, “Microsoft Error Reporting Drives Bug Fixing Efforts,” ENT News Online, http://entmag.com/newsarticle.asp?editorial sid=5532 , 2002.
[5] K. Forster, “Windows Error Reporting: Elementary, My Dear Watson,” Windows IT Pro, http://emea.windowsitpro.com/article/articleid 46982, 2005.
[6] K. Forster, “Casting a Quiet Vote through Office 2003,” Windows IT Pro, http://emea.windowsitpro.com/article/articleid 42515, 2004.
[7] Z. Li, L. Tan, X. Wang, S. Lu, Y. Zhou, and C. Zhai, “Having Things Changed Now?—An Empirical Study of Bug Characteristics in Modern Open Source Software,” Proc. First Workshop Architectural and System Support for Improving Software Dependability, 2006.
[8] J. Seward and N. Nethercote, “Valgrind, An Open-Source Memory Debugger for x86-gnu/linux,” http:/valgrind. kde.org/, 2008.
[9] R. Hastings and B. Joyce, “Purify: Fast Detection of Memory Leaks and Access Errors,” Proc. Usenix Winter Technical Conf., pp. 125-138, Dec. 1992.
[10] A. Podgurski, D. Leon, P. Francis, W. Masri, M. Minch, J. Sun, and B. Wang, “Automated Support for Classifying Software Failure Reports,” Proc. 25th Int'l Conf. Software Eng., pp. 465-475, 2003.
[11] J.F. Bowring, J.M. Rehg, and M.J. Harrold, “Active Learning for Automatic Classification of Software Behavior,” Proc. Int'l Symp. Software Testing and Analysis, pp. 195-205, 2004.
[12] A. Orso, S. Sinha, and M.J. Harrold, “Classifying Data Dependences in the Presence of Pointers for Program Comprehension, Testing, and Debugging,” ACM Trans. Software Eng. and Methodology, vol. 13, no. 2, pp. 199-239, 2004.
[13] T. Mitchell, Machine Learning. McGraw-Hill Education, 1997.
[14] W.M. Rand, “Objective Criteria for the Evaluation of Clustering Methods,” J. Am. Statistical Assoc., vol. 66, pp. 846-850, 1971.
[15] P. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. Addison-Wesley, 2006.
[16] T. Hofmann and J.M. Buhmann, “Pairwise Data Clustering by Deterministic Annealing,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 1, pp. 1-14, Jan. 1997.
[17] A.K. Jain, M.N. Murty, and P.J. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.
[18] P. Berkhin, “Survey of Clustering Data Mining Techniques,” technical report, Accrue Software, http://www.accrue.com/productsrp_cluster_review.pdf , 2002.
[19] T.H. Cormen, C.E. Leiserson, and R.L. Rivest, Introduction to Algorithms. MIT Press and McGraw-Hill, 1989.
[20] H. Agrawal, J.R. Horgan, E.W. Krauser, and S. London, “Incremental Regression Testing,” Proc. Int'l Conf. Software Maintenance, pp. 348-357, 1993.
[21] M. Levandowsky and D. Winter, “Distance between Sets,” Nature, vol. 234, pp. 34-35, Nov. 1971.
[22] B. Liblit, A. Aiken, A. Zheng, and M. Jordan, “Bug Isolation via Remote Program Sampling,” Proc. ACM SIGPLAN Int'l Conf. Programming Language Design and Implementation, pp. 141-154, 2003.
[23] B. Liblit, M. Naik, A. Zheng, A. Aiken, and M. Jordan, “Scalable Statistical Bug Isolation,” Proc. ACM SIGPLAN Int'l Conf. Programming Language Design and Implementation, 2005.
[24] C. Liu, X. Yan, L. Fei, J. Han, and S.P. Midkiff, “SOBER: Statistical Model-Based Bug Localization,” Proc. 10th European Software Eng. Conf. held jointly with 13th ACM SIGSOFT Int'l Symp. Foundations of Software Eng., pp. 286-295, 2005.
[25] P.E. Black, “Minkowski Distance,” Dictionary of Algorithms and Data Structures, US Nat'l Inst. of Standards and Technology, http://www.nist.gov/dads/HTMLMinkowskiDistance.html , 2006.
[26] B. Korel and J. Laski, “Dynamic Program Slicing,” Information Processing Letters, vol. 29, no. 3, pp. 155-163, 1988.
[27] M. Renieris and S. Reiss, “Fault Localization with Nearest Neighbor Queries,” Proc. 18th IEEE Int'l Conf. Automated Software Eng., 2003.
[28] A. Zeller, “Isolating Cause-Effect Chains from Computer Programs,” Proc. ACM 10th Int'l Symp. Foundations of Software Eng., 2002.
[29] H. Cleve and A. Zeller, “Locating Causes of Program Failures,” Proc. 27th Int'l Conf. Software Eng., pp. 342-351, 2005.
[30] J.A. Jones and M.J. Harrold, “Empirical Evaluation of the Tarantula Automatic Fault-Localization Technique,” Proc. 20th IEEE/ACM Int'l Conf. Automated Software Eng., pp. 273-282, 2005.
[31] C. Liu, Z. Lian, and J. Han, “How Bayesians Debug,” Proc. Sixth IEEE Int'l Conf. Data Mining, pp. 382-393, 2006.
[32] M. Kendall and J.D. Gibbons, Rank Correlation Methods. Edward Ar nold, 1990.
[33] C. Liu and J. Han, “Failure Proximity: A Fault Localization-Based Approach,” Proc. 14th ACM SIGSOFT Int'l Symp. Foundations of Software Eng., pp. 286-295, 2006.
[34] C. Liu, L. Fei, X. Yan, J. Han, and S.P. Midkiff, “Statistical Debugging: A Hypothesis Testing-Based Approach,” IEEE Trans. Software Eng., vol. 32, no. 10, pp. 831-848, Oct. 2006.
[35] I. Borg and P. Groenen, Modern Multidimensional Scaling: Theory and Applications, first ed. Springer, 1996.
[36] W. Dickinson, D. Leon, and A. Podgurski, “Finding Failures by Cluster Analysis of Execution Profiles,” Proc. 23rd Int'l Conf. Software Eng., pp. 339-348, 2001.
[37] H. Do, S.G. Elbaum, and G. Rothermel, “Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and Its Potential Impact,” Empirical Software Eng.: An Int'l J., vol. 10, no. 4, pp. 405-435, 2005.
[38] “Diablo Is a Better Link-Time Optimizer,” http:/diablo.elis. ugent.be/, 2008.
[39] X. Zhang and R. Gupta, “Whole Execution Traces,” Proc. 37th Ann. ACM/IEEE Int'l Symp. Microarchitecture, pp. 105-116, 2004.
[40] X. Zhang, N. Gupta, and R. Gupta, “Locating Faulty Code by Multiple Points Slicing,” Software Practice and Experience, vol. 37, no. 9, pp. 935-961, 2007.
[41] N. Gupta, H. He, X. Zhang, and R. Gupta, “Locating Faulty Code Using Failure-Inducing Chops,” Proc. 18th IEEE Int'l Conf. Automated Software Eng., pp. 263-272, 2005.
[42] L. McLaughlin, “Automated Bug Tracking: The Promise and the Pitfalls,” IEEE Software, vol. 21, pp. 100-103, 2004.
[43] J. Jones, J. Bowring, and M.J. Harrold, “Debugging in Parallel,” Proc. ACM/SIGSOFT Int'l Symp. Software Testing and Analysis, 2007.
[44] Bugzilla, http:/www.bugzilla.org/, 2008.
[45] D. Cubranic and G.C. Murphy, “Automatic Bug Triage Using Text Categorization,” Proc. 16th Int'l Conf. Software Eng. and Knowledge Eng., pp. 92-97, 2004.
[46] J. Anvik, L. Hiew, and G.C. Murphy, “Who Should Fix This Bug?” Proc. 28th Int'l Conf. Software Eng., pp. 361-370, 2006.
[47] J. Anvik, L. Hiew, and G. Murphy, “Coping with an Open Bug Repository,” Proc. OOPSLA Workshop Eclipse Technology eXchange, pp. 35-39, 2005.
[48] A.J. Ko, B.A. Myers, and D.H. Chau, “A Linguistic Analysis of How People Describe Software Problems,” Proc. IEEE Symp. Visual Languages and Human-Centric Computing, pp. 127-134, 2006.
[49] P. Runeson, M. Alexandersson, and O. Nyholm, “Detection of Duplicate Defect Reports Using Natural Language Processing,” Proc. 29th Int'l Conf. Software Eng., pp. 499-510, 2007.
[50] B.R. Liblit, “Cooperative Bug Isolation,” PhD dissertation, Univ. of California, Berkeley, 2005.
[51] “Using FogBUGZ to Get Crash Reports from Users-Automatically!” http://sbml.org/trackersusingfogbugztogetcrash rep.php , 2008.
[52] W. Dickinson, D. Leon, and A. Podgurski, “Pursuing Failure: the Distribution of Program Failures in a Profile Space,” Proc. Eighth European Software Eng. Conf. held jointly with Ninth ACM SIGSOFT Int'l Symp. Foundations of Software Eng., pp. 246-255, 2001.
[53] P. Francis, D. Leon, M. Minch, and A. Podgurski, “Tree-Based Methods for Classifying Software Failures,” Proc. 15th IEEE Int'l Symp. Software Reliability Eng., 2004.
[54] E. Renieris, “A Research Framework for Software-Fault Localization Tools,” PhD dissertation, Brown Univ., Providence, R.I., 2005.
[55] G. Misherghi and Z. Su, “HDD: Hierarchical Delta Debugging,” Proc. 28th Int'l Conf. Software Eng., 2006.
[56] M. Haran, A. Karr, A. Orso, A. Porter, and A. Sanil, “Applying Classification Techniques to Remotely-Collected Program Execution Data,” Proc. 10th European Software Eng. Conf. held jointly with 13th ACM SIGSOFT Int'l Symp. Foundations of Software Eng., pp.146-155, 2005.
[57] M. Haran, A. Karr, M. Last, A. Orso, A. Porter, A. Sanil, and S. Fouch, “Techniques for Classifying Executions of Deployed Software to Support Software Engineering Tasks,” IEEE Trans. Software Eng., vol. 33, no. 3, pp. 287-304, Mar. 2007.
[58] J. Clause and A. Orso, “A Technique for Enabling and Supporting Debugging of Field Failures,” Proc. 29th Int'l Conf. Software Eng., pp. 261-270, May 2007.
[59] A.X. Zheng, M.I. Jordan, B. Liblit, M. Naik, and A. Aiken, “Statistical Debugging: Simultaneous Identification of Multiple Bugs,” Proc. 23rd Int'l Conf. Machine Learning, pp. 1105-1112, 2006.
[60] D. Andrzejewski, A. Mulhern, B. Liblit, and X. Zhu, “Statistical Debugging Using Latent Topic Models,” Proc. 18th European Conf. Machine Learning, S. Matwin and D. Mladenic, eds., pp. 6-17, 2007.
[61] N. Gupta, H. He, X. Zhang, and R. Gupta, “Locating Faulty Code Using Failure-Inducing Chops,” Proc. 20th IEEE/ACM Int'l Conf. Automated Software Eng., pp. 263-272, 2005.
[62] H. Agrawal and J.R. Horgan, “Dynamic Program Slicing,” Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, 1990.
[63] X. Zhang, H. He, N. Gupta, and R. Gupta, “Experimental Evaluation of Using Dynamic Slices for Fault Location,” Proc. Sixth Int'l Symp. Automated Analysis-Driven Debugging, 2005.
[64] X. Zhang, N. Gupta, and R. Gupta, “Pruning Dynamic Slices with Confidence,” Proc. ACM SIGPLAN Int'l Conf. Programming Language Design and Implementation, pp. 169-180, 2006.
[65] J.I. Marden, Analyzing and Modeling Rank Data, first ed. Chapman and Hall/CRC, 1996.

Index Terms:
Debugging aids, Dumps
Citation:
Chao Liu, Xiangyu Zhang, Jiawei Han, "A Systematic Study of Failure Proximity," IEEE Transactions on Software Engineering, vol. 34, no. 6, pp. 826-843, Nov.-Dec. 2008, doi:10.1109/TSE.2008.66
Usage of this product signifies your acceptance of the Terms of Use.