This Article 
 Bibliographic References 
 Add to: 
Automatic Mining of Source Code Repositories to Improve Bug Finding Techniques
June 2005 (vol. 31 no. 6)
pp. 466-480
We describe a method to use the source code change history of a software project to drive and help to refine the search for bugs. Based on the data retrieved from the source code repository, we implement a static source code checker that searches for a commonly fixed bug and uses information automatically mined from the source code repository to refine its results. By applying our tool, we have identified a total of 178 warnings that are likely bugs in the Apache Web server source code and a total of 546 warnings that are likely bugs in Wine, an open-source implementation of the Windows API. We show that our technique is more effective than the same static analysis that does not use historical data from the source code repository.

[1] Apache Web Server, httpd, available online at http:/httpd., 2004.
[2] K. Ashcraft and D. Engler, “Using Programmer-Written Compiler Extensions to Catch Security Holes,” Proc. IEEE Symp. Security and Privacy, May 2002.
[3] T. Ball and S.K. Rajamani, “The SLAM Project: Debugging System Software via Static Analysis,” Proc. 29th Symp. Principles of Programming Languages (POPL '02), pp. 1-3, Jan. 2002.
[4] J. Bevan and E.J. Whitehead, “Identification of Software Instabilities,” Proc. 10th Working Conf. Reverse Eng. (WCRE '03), pp. 134-143, Nov. 2003.
[5] A. Chen, E. Chou, J. Wong, A.Y. Yao, Q. Zhang, S. Zhang, and A. Michal, “CVSSearch: Searching through Source Code using CVS Comments,” Proc. IEEE Int'l Conf. Software Maintenance (ICSM '01), pp. 364-373, Nov. 2001.
[6] D. Cubranic, “Project History as a Group Memory: Learning from the Past,” PhD thesis, Univ. of British Columbia, 2004.
[7] CVSConcurrent Versions System, available online at http:/, 2004.
[8] A. Descartes and T. Bunce, Programming the Perl DBI. O'Reilly, 2000.
[9] D. Engler, B. Chelf, A. Chou, and S. Hallem, “Checking System Rules Using System Specific, Programmer-Written Compiler Extensions,” Proc. Fourth Symp. Operating Systems Design and Implementation, Oct. 2000.
[10] R. Ferenc, I. Siket, and T. Gyimothy, “Extracting Facts from Open Source Software,” Proc. 20th Int'l Conf. Software Maintenance (ICSM '04), pp. 60-69, Sept. 2004.
[11] M. Fischer and H. Gall, “Visualizing Feature Evolution of Large-Scale Software based on Problem and Modification Report Data,” J. Software Maintenance and Evolution: Research and Practice, vol. 16, pp. 385-403, Nov./Dec. 2004.
[12] M. Fischer, M. Pinzger, and H. Gall, “Analyzing and Relating Bug Report Data for Feature Tracking,” Proc. 10th Working Conf. Reverse Eng. (WCRE '03), pp. 90-99, Nov. 2003.
[13] D.M. German, “An Empirical Study of Fine-Grained Software Modifications,” Proc. 20th Int'l Conf. Software Maintenance (ICSM '04), pp. 316-325, Sept. 2004.
[14] T.L. Graves, A.F. Karr, J.S. Marron, and H. Siy, “Predicting Fault Incidence Using Software Change History,” IEEE Trans. Software Eng., vol. 26, no. 7, pp. 653-661, July 2000.
[15] A.E. Hassan and R.C. Holt, “Predicting Change Propagation in Software Systems,” Proc. 20th Int'l Conf. Software Maintenance (ICSM '04), pp. 284-293, Sept. 2004.
[16] D.L. Heine and M.S. Lam, “A Practical Flow-Sensitive and Context-Sensitive C and C++ Memory Leak Detector,” Proc. Conf. Programming Language Design and Implementation (PLDI '03), June 2003.
[17] D. Hovemeyer and W. Pugh, “Finding Bugs Is Easy,” Companion of the 19th Ann. ACM SIGPLAN Conf. Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA '04), Oct. 2004.
[18] S. Johnson, Unix Time Sharing System Programmer's Manual, seventh ed. vol. 2A, AT&T Bell Laboratories 1979.
[19] T. Kremeneck and D. Engler, “Z-Ranking: Using Statistical Analysis to Counter the Impact of Static Analysis Approximations,” Proc. 10th Ann. Int'l Static Analysis Symp. (SAS '03), pp. 295-315, June 2003.
[20] T. Matsumura, A. Monden, and K. Matsumoto, “The Detection of Faulty Code Violating Implicit Coding Rules,” Proc. Int'l Workshop Principles of Software Evolution (IWPSE '02), pp. 15-21, May 2002.
[21] T. Menzies, J.S. DiStefano, C. Cunanan, and R. Chapman, “Mining Repositories to Assist in Project Planning and Resource Allocation,” Proc. Int'l Workshop Mining Software Repositories (MSR '04), May 2004.
[22] T.J. Ostrand, E.J. Weyuker, and R.M. Bell, “Where the Bugs Are,” Proc. 2004 ACM SIGSOFT Int'l Symp. Software Testing and Analysis (ISSTA '04), July 2004.
[23] R. Purushothaman and D.E. Perry, “Towards Understanding the Rhetoric of Small Changes,” Proc. Int'l Workshop Mining Software Repositories (MSR '04), May 2004.
[24] D. Quinlan, “ROSE: A Preprocessor Generation Tool for Leveraging the Semantics of Parallel Object-Oriented Frameworks to Drive Optimizations via Source Code Transformations,” Proc. Eighth Int'l Workshop Compilers for Parallel Computers (CPC '00), Jan. 2000.
[25] RCS, available online at index.html, 2004.
[26] F. Rysselberghe and S. Demeyer, “Mining Version Control Systems for FACs (Frequently Applied Changes),” Proc. Int'l Workshop Mining Software Repositories (MSR '04), May 2004.
[27] R.M. Stallman, Using the GNU Compiler Collection. GNU Press, 2004.
[28] M. Widenius and D. Axmark, MySQL Reference Manual Documentation from the Source. O'Reilly, 2002.
[29] C.C. Williams and J.K. Hollingsworth, “Bug Driven Bug Finders,” Proc. Int'l Workshop Mining Software Repositories (MSR '04), May 2004.
[30] Wine, available online at http:/, 2004.
[31] T. Zimmermann and P. Weissgerber, “Preprocessing CVS Data for Fine-Grained Analysis,” Proc. Int'l Workshop Mining Software Repositories (MSR '04), May 2004.

Index Terms:
Index Terms- Testing tools, version control, configuration control, debugging aids.
Chadd C. Williams, Jeffrey K. Hollingsworth, "Automatic Mining of Source Code Repositories to Improve Bug Finding Techniques," IEEE Transactions on Software Engineering, vol. 31, no. 6, pp. 466-480, June 2005, doi:10.1109/TSE.2005.63
Usage of this product signifies your acceptance of the Terms of Use.