This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Discovering Neglected Conditions in Software by Mining Dependence Graphs
Sept.-Oct. 2008 (vol. 34 no. 5)
pp. 579-596
Ray-Yaung Chang, Case Western Reserve University, Cleveland
Andy Podgurski, Case Western Reserve University, Cleveland
Jiong Yang, Case Western Reserve University, Cleveland
Neglected conditions are an important but difficult-to-find class of software defects. This paper presents a novel approach to revealing neglected conditions that integrates static program analysis and advanced data mining techniques to discover implicit conditional rules in a code base and to discover rule violations that indicate neglected conditions. The approach requires the user to indicate minimal constraints on the context of the rules to be sought, rather than specific rule templates. To permit this generality, rules are modeled as graph minors of enhanced procedure dependence graphs (EPDGs), in which control and data dependence edges are augmented by edges representing shared data dependences. A heuristic maximal frequent subgraph mining algorithm is used to extract candidate rules from EPDGs, and a heuristic graph matching algorithm is used to identify rule violations. We also report the results of an empirical study in which the approach was applied to four open source projects (openssl, make, procmail, amaya). These results indicate that the approach is effective and reasonably efficient.

[1] Apache HTTP Server Project, Apache.org, www.apache.org, 2008.
[2] M. Acharya, T. Xie, J. Pei, and J. Xu, “Mining API Patterns as Partial Orders from Source Code: From Usage Scenarios to Specifications,” Proc. Sixth Joint Meeting of the European Software Eng. Conf. and the ACM SIGSOFT Symp. Foundations of Software Eng., pp. 25-34, 2007.
[3] T.A. Budd, R.A. DeMillo, R.J. Lipton, and F.G. Sayward, “Theoretical and Empirical Studies on Using Program Mutation to Test the Functional Correctness of Programs,” Proc. Seventh Ann. ACM Symp. Principles of Programming Languages, pp. 220-233, 1980.
[4] D. Burdick, M. Calimlim, and J. Gehrke, “MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases,” Proc. 17th Int'l Conf. Data Eng., 2001.
[5] R.Y. Chang, A. Podgurski, and J. Yang, “Finding What's Not There: A New Approach to Revealing Neglected Conditions in Software,” Proc. ACM Int'l Symp. Software Testing and Analysis, pp.163-173, 2007.
[6] B. Chelf, D. Engler, and S. Hallem, “How to Write System-Specific, Static Checkers in Metal,” Proc. ACM Workshop Program Analysis for Software Tools and Eng., pp. 51-56, 2002.
[7] H. Chockler, O. Kupferman, and M. Vardi, “Coverage Metrics for Formal Verification,” Lecture Notes in Computer Science, vol. 2860, pp. 111-125, 2003.
[8] A. Dunsmore, M. Roper, and M. Wood, “Practical Code Inspection Techniques for Object-Oriented Systems: An Experimental Comparison,” IEEE Software, vol. 20, no. 4, pp. 21-29, July/Aug. 2003.
[9] D. Engler, D.Y. Chen, S. Hallem, A. Chou, and B. Chelf, “Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code,” Proc. 18th ACM Symp. Operating Systems Principles, pp. 57-72, 2001.
[10] D. Engler, “Meta-Level Compilation,” metacomp.stanford.edu, 2008.
[11] M.E. Fagan, “Design and Code Inspections to Reduce Errors in Program Development,” IBM Systems J., vol. 5, no. 1, pp. 258-287, 1976.
[12] G. Fatta, S. Leue, and E. Stegantova, “Discriminative Pattern Mining in Software Fault Detection,” Proc. Third Int'l Workshop Software Quality Assurance, 2006.
[13] J. Ferrante, K.J. Ottenstein, and J.D. Warren, “The Program Dependence Graph and Its Use in Optimization,” ACM Trans. Programming Languages and Systems, vol. 9, pp. 319-349, 1987.
[14] P. Festa, “Study Says Buffer Overflow Is Most Common Security Bug,” CNews.com, www.news.com2100-1001-233483.html, 1999.
[15] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman, 1979.
[16] Grammatech, CodeSurfer, www.grammatech.com/products/codesurferoverview.html , 2008.
[17] Grammatech, “Dependence Graphs and Program Slicing,” www.grammatech.com/research/slicingslicingWhitepaper , 2008.
[18] L.B. Holder, D.J. Cook, and S. Djoko, “Substructure Discovery in the SUBDUE System,” Proc. AAAI Workshop Knowledge Discovery in Databases, pp. 169-180, 1994.
[19] S. Horwitz, T. Reps, and D. Binkley, “Interprocedural Slicing Using Dependence Graphs,” ACM Trans. Programming Languages and Systems, vol. 12, no. 1, pp. 26-60, Jan. 1990.
[20] M. Howard and D. LeBlanc, Writing Secure Code, second ed. Microsoft Press, 2003.
[21] W. Howden, “Reliability of the Path Analysis Testing Strategy,” IEEE Trans. Software Eng., vol. 2, pp. 208-215, Sept. 1976.
[22] J. Huan, W. Wang, and J. Prins, “Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism,” Proc. Third IEEE Int'l Conf. Data Mining, pp. 549-552, 2003.
[23] J. Huan, W. Wang, J. Prins, and J. Yang, “SPIN: Mining Maximal Frequent Subgraphs from Graph Database,” Proc. 10th Int'l Conf. Knowledge Discovery and Data Mining, 2004.
[24] IBM, “Orthogonal Defect Classification,” Center for Software Eng., www.research.ibm.com/softeng/ODCODC.HTM, 2008.
[25] J. Krinke, “Identifying Similar Code with Program Dependence Graphs,” Proc. Eighth Working Conf. Reverse Eng., 2001.
[26] M. Kuramochi and G. Karypis, “Finding Frequent Patterns in a Large Sparse Graph,” Data Mining and Knowledge Discovery, vol. 11, no. 3, pp. 243-271, Nov. 2005.
[27] M. Kuramochi and G. Karypis, “GREW—A Scalable Frequent Subgraph Discovery Algorithm,” Proc. Fourth IEEE Int'l Conf. Data Mining, pp. 439-442, Nov. 2004.
[28] Z. Li and Y. Chou, “PR-Miner: Automatically Extracting Implicit Programming Rules and Detecting Violations in Large Software Code,” Proc. Fifth Joint Meeting of the European Software Eng. Conf. and the ACM SIGSOFT Symp. Foundations of Software Eng., 2005.
[29] Z. Li, S. Lu, and S. Myagmar, “CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code,” IEEE Trans. Software Eng., vol. 33, no. 3, pp. 176-192, Mar. 2006.
[30] C. Liu, X. Yan, and J. Han, “GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis,” Proc. 12th Int'l Conf. Knowledge Discovery and Data Mining, 2006.
[31] B. Livshits and T. Zimmermann, “DynaMine: Finding Common Error Patterns by Mining Software Revision Histories,” Proc. Fifth Joint Meeting of the European Software Eng. Conf. and the ACM SIGSOFT Symp. Foundations of Software Eng., pp. 296-305, 2005.
[32] S. McClure, J. Scambray, and G. Kurtz, Hacking Exposed: Network Security Secrets and Solutions, fifth ed. McGraw Hill, 2005.
[33] G.J. Meyers, “A Controlled Experiment in Program Testing and Code Walkthroughs/Inspections,” Comm. ACM, vol. 21, pp. 760-768, Sept. 1978.
[34] G.J. Meyers, The Art of Software Testing. Wiley, 1979.
[35] Mozilla.org, Bugzilla, https:/bugzilla.mozilla.org/, 2008.
[36] NIST, Nat'l Vulnerability Database, http:/nvd.nist.gov/, 2008.
[37] Openssl Project, www.openssl.org, 2008.
[38] S. Raghavan, R. Rohana, D. Leon, A. Podgurski, and V. Augustine, “Dex: A Semantic-Graph Differencing Tool for Studying Changes in Large Code Bases,” Proc. 20th IEEE Int'l Conf. Software Maintenance, pp. 188-197, Sept. 2004.
[39] M. Ramanathan, A. Grama, and S. Jagannathan, “Path Sensitive Inference of Function Precedence Protocols,” Proc. 29th Int'l Conf. Software Eng., 2007.
[40] M. Ramanathan, A. Grama, and S. Jagannathan, “Static Specification Inference Using Predicate Mining,” Proc. ACM SIGPLAN Int'l Conf. Programming Language Design and Implementation, 2007.
[41] M. Renieres and S.P. Reiss, “Fault Localization with Nearest Neighbor Queries,” Proc. 18th Int'l Conf. Automated Software Eng., pp. 30-39, 2003.
[42] D.J. Richardson and L.A. Clarke, “Partition Analysis: A Method Combining Testing and Verification,” IEEE Trans. Software Eng., vol. 11, pp. 1477-1490, Dec. 1985.
[43] N. Robertson and P.D. Seymour, “Graph Minors. I. Excluding a Forest,” J. Combinatorial Theory, Series B, vol. 35, no. 1, pp. 39-61, 1983.
[44] S. Shoham, E. Yahav, S. Fink, and M. Pistoia, “Static Specification Mining Using Automata-Based Abstractions,” Proc. ACM Int'l Symp. Software Testing and Analysis, pp. 174-184, 2007.
[45] L. Sommerville, P. Sawyer, and S. Viller, “Viewpoints for Requirements Elicitation: A Practical Approach,” Proc. Third IEEE Int'l Conf. Requirements Eng., 1998.
[46] SSLeay Documentation, www.columbia.edu/~arielssleay, 2008.
[47] T.A. Thayer, M. Lipow, and E.C. Nelson, “Software Reliability Study,” TRW-SS-76-03, TRW, Redondo Beach, Calif., Mar. 1976.
[48] L. Thomas, S. Valluri, and K. Karlapalem, “MARGIN: Maximal Frequent Subgraph Mining,” Proc. Sixth IEEE Int'l Conf. Data Mining, 2006.
[49] S. Thummalapenta and T. Xie, “PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web,” Proc. 22nd IEEE/ACM Int'l Conf. Automated Software Eng., pp. 204-213, Nov. 2007.
[50] A. Wasylkowski, A. Zeller, and C. Lindig, “Detecting Object Usage Anomalies,” Proc. Sixth Joint Meeting of the European Software Eng. Conf. and the ACM SIGSOFT Symp. Foundations of Software Eng., 2007.
[51] J. Wilander and P. Fak, “Rule Matching Security Properties of Code Using Dependence Graphs,” Proc. First Int'l Workshop Code Based Software Security Assessments, 2005.
[52] C.C. Williams and J.K. Hollingsworth, “Automatic Mining of Source Code Repositories to Improve Bug Finding Techniques,” IEEE Trans. Software Eng., vol. 31, no. 6, pp. 466-480, June 2005.
[53] J.M. Wing and M. Vaziri-Farahani, “Model Checking Software Systems: A Case Study,” Proc. Third ACM Symp. Foundations of Software Eng., pp. 128-139, 1995.
[54] X. Yan and J. Han, “gSpan: Graph-Based Substructure Rule Mining,” Proc. Second IEEE Int'l Conf. Data Mining, pp. 721-723, 2002.
[55] S. Zhang, J. Yang, and V. Cheedella, “Monkey: Approximate Graph Mining Based on Spanning Trees,” Proc. 23rd Int'l Conf. Data Eng., 2007.
[56] H. Zhu, P.A. Hall, and J.H. May, “Software Unit Test Coverage and Adequacy,” ACM Computing Surveys, vol. 29, pp. 366-427, Dec. 1997.

Index Terms:
Methods for SQA and V&V, Pre- and post-conditions
Citation:
Ray-Yaung Chang, Andy Podgurski, Jiong Yang, "Discovering Neglected Conditions in Software by Mining Dependence Graphs," IEEE Transactions on Software Engineering, vol. 34, no. 5, pp. 579-596, Sept.-Oct. 2008, doi:10.1109/TSE.2008.24
Usage of this product signifies your acceptance of the Terms of Use.