This Article 
 Bibliographic References 
 Add to: 
Predicting Source Code Changes by Mining Change History
September 2004 (vol. 30 no. 9)
pp. 574-586
Software developers are often faced with modification tasks that involve source which is spread across a code base. Some dependencies between source code, such as those between source code written in different languages, are difficult to determine using existing static and dynamic analyses. To augment existing analyses and to help developers identify relevant source code during a modification task, we have developed an approach that applies data mining techniques to determine change patterns—sets of files that were changed together frequently in the past—from the change history of the code base. Our hypothesis is that the change patterns can be used to recommend potentially relevant source code to a developer performing a modification task. We show that this approach can reveal valuable dependencies by applying the approach to the Eclipse and Mozilla open source projects and by evaluating the predictability and interestingness of the recommendations produced for actual modification tasks on these systems.

[1] H. Agrawal and J.R. Horgan, Dynamic Program Slicing Proc. Conf. Programming Language Design and Implementation, pp. 246-256, June 1990.
[2] R. Agrawal, T. Imielinski, and A.N. Swami, Mining Association Rules between Sets of Items in Large Databases Proc. Int'l Conf. Management of Data, pp. 207-216, 1993.
[3] R. Agrawal and R. Srikant, Fast Algorithms for Mining Association Rules Proc. Int'l Conf. Very Large Data Bases, pp. 487-499, 1994.
[4] R. Arnold and S. Bohner, Software Change Impact Analysis. IEEE CS Press, 1996.
[5] B.S. Baker, A Program for Identifying Duplicated Code Computing Science and Statistics, vol. 24, pp. 49-57, 1992.
[6] I.D. Baxter, A. Yahin, L.M.D. Moura, M. Sant'Anna, and L. Bier, Clone Detection Using Abstract Syntax Trees Proc. Int'l Conf. Software Maintenance, pp. 368-377, 1998.
[7] S. Brin, R. Motwani, and C. Silverstein, Beyond Market Baskets: Generalizing Association Rules to Correlations Proc. Int'l Conf. Management of Data, pp. 265-276, 1997.
[8] W. Cheung and O. Zaines, Incremental Mining of Frequent Patterns without Candidate Generation or Support Constraint Proc. Int'l Database Eng. and Applications Symp., pp. 111-116, 2003.
[9] D. Cubranic and G.C. Murphy, Hipikat: Recommending Pertinent Software Development Artifacts Proc. Int'l Conf. Software Eng., pp. 408-418, 2003.
[10] M. Fischer, M. Pinzger, and H. Gall, Analyzing and Relating Bug Report Data for Feature Tracking Proc. Working Conf. Reverse Eng., pp. 90-99, 2003.
[11] M. Fischer, M. Pinzger, and H. Gall, Populating a Release History Database from Version Control and Bug Tracking Systems Proc. Int'l Conf. Software Maintenance, pp. 23-33, 2003.
[12] K. Gallagher and J. Lyle, “Using Program Slicing in Software Maintenance,” IEEE Trans. Software Eng., Aug. 1991, pp. 751-761.
[13] J. Han, J. Pei, and Y. Yin, Mining Frequent Patterns without Candidate Generation Proc. Int'l Conf. Management of Data, W. Chen et al., eds., pp. 1-12, 2000.
[14] J. Krinke, Identifying Similar Code with Program Dependency Graphs Proc. Working Conf. Reverse Eng., pp. 301-309, 2001.
[15] D. LeBlang, The CM Challenge: Configuration Management that Works. John Wiley&Sons, 1994.
[16] B. Magnusson and U. Asklund, Fine Grained Version Control of Configurations in Coop/Orm Proc. Int'l Synp. System Configuration Management, pp. 31-48, 1996.
[17] J. Mayrand, C. Leblanc, and E.M. Merlo, “Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics,” Proc. IEEE Int'l Conf. Software Maintenance (ICSM '96), pp. 244-253, Nov. 1996.
[18] A. Michail, Data Mining Library Reuse Patterns in User-Selected Applications Proc. Int'l Conf. Automated Software Eng. , pp. 24-33, 1999.
[19] A. Michail, Data Mining Library Reuse Patterns Using Generalized Association Rules Proc. Int'l Conf. Software Eng., pp. 167-176, 2000.
[20] A. Mockus, R.T. Fielding, and J. Herbsleb, Two Case Studies of Open Source Software Development: Apache and Mozilla ACM Trans. Software Eng. and Methodology, vol. 11, no. 3, pp. 1-38, 2002.
[21] A. Mockus and D.M. Weiss, Globalization by Chunking: A Quantitative Approach IEEE Software, vol. 18, no. 2, pp. 30-37, Mar./Apr. 2001.
[22] J.-S. Park, M.-S. Chen, and P.S. Yu, Using a Hash-Based Method with Transaction Trimming for Mining Association Rules IEEE Trans. Knowledge and Data Eng., vol. 9, no. 5, pp. 813-825, Oct. 1997.
[23] D.L. Parnas, On the Criteria to Be Used in Decomposing Systems into Modules Comm. ACM, pp. 1053-1058, 1972.
[24] K. Sartipi, K. Kontogiannis, and F. Mavaddat, Architectural Design Recovery Using Data Mining Techniques Proc. European Conf. Software Maintenance and Reeng., pp. 129-140, 2000.
[25] J.S. Shirabad, T.C. Lethbridge, and S. Matwin, Supporting Maintenance of Legacy Software with Data Mining Techniques Proc. Conf. the Centre for Advanced Studies on Collaborative Research, 2000.
[26] C. Tjortjis, L. Sinos, and P. Layzell, Facilitating Program Comprehension by Mining Association Rules from Source Code Proc. Int'l Workshop Program Comprehension, pp. 125-133, 2003.
[27] M. Weiser, Program Slicing Trans. Software Eng., vol. 10, no. 7, pp. 352-357, July 1984.
[28] T. Zimmermann, S. Diehl, and A. Zeller, How History Justifies System Architecture (or Not) Proc. Int'l Workshop Principles of Software Evolution, 2003.
[29] T. Zimmermann, P. Weißgerber, S. Diehl, and A. Zeller, Mining Version Histories to Guide Software Changes Proc. Int'l Conf. Software Eng., pp. 563-572, 2004.

Index Terms:
Enhancement, maintainability, clustering, classification, association rules, data mining.
Annie T.T. Ying, Gail C. Murphy, Raymond Ng, Mark C. Chu-Carroll, "Predicting Source Code Changes by Mining Change History," IEEE Transactions on Software Engineering, vol. 30, no. 9, pp. 574-586, Sept. 2004, doi:10.1109/TSE.2004.52
Usage of this product signifies your acceptance of the Terms of Use.