This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Mining Version Histories to Guide Software Changes
June 2005 (vol. 31 no. 6)
pp. 429-445
We apply data mining to version histories in order to guide programmers along related changes: "Programmers who changed these functions also changed...." Given a set of existing changes, the mined association rules 1) suggest and predict likely further changes, 2) show up item coupling that is undetectable by program analysis, and 3) can prevent errors due to incomplete changes. After an initial change, our ROSE prototype can correctly predict further locations to be changed; the best predictive power is obtained for changes to existing software. In our evaluation based on the history of eight popular open source projects, ROSE's topmost three suggestions contained a correct location with a likelihood of more than 70 percent.

[1] R. Agrawal and R. Srikant , “Fast Algorithms for Mining Association Rules,” Proc. 20th Very Large Data Bases Conf. (VLDB), pp. 487-499, 1994.
[2] D.L. Atkins , “Version Sensitive Editing: Change History as a Programming Tool,” Proc. Conf. System Configuration Management (SCM '98), 1998.
[3] T. Ball , J.-M. Kim , A.A. Porter , and H.P. Siy , “If Your Version Control System Could Talk,” Proc. ICSE Workshop Process Modelling and Empirical Studies of Software Eng., 1997.
[4] B. Berliner , “CVS II: Parallelizing Software Development,” Proc. Winter 1990 USENIX Conf., pp. 341-352, Jan. 1990.
[5] J.M. Bieman , A.A. Andrews , and H.J. Yang , “Understanding Change-Proneness In OO Software through Visualization,” Proc. 11th Int'l Workshop Program Comprehension, pp. 44-53, May 2003.
[6] M. Burch , S. Diehl , and P. Weißgerber , “Visual Data Mining in Software Archives,” Proc. ACM Symp. Software Visualization (SOFTVIS), 2005.
[7] A. Chen , E. Chou , J. Wong , A.Y. Yao , Q. Zhang , S. Zhang , and A. Michail , “CVSSearch: Searching through Source Code Using CVS Comments,” Proc. Int'l Conf. Software Methods, pp. 364-374, 2001.
[8] D. ubrani and G.C. Murphy , “Hipikat: Recommending Pertinent Software Development Artifacts,” Proc. Int'l Conf. Software Eng., pp. 408-418, 2003.
[9] K. Fogel and M. O'Neill , “cvs2cl.pl: CVS-Log-Message-to-ChangeLog Conversion Script,” http://www.red-bean.comcvs2cl/, Sept. 2002.
[10] H. Gall , K. Hajek , and M. Jazayeri , “Detection of Logical Coupling Based on Product Release History,” Proc. Int'l Conf. Software Maintenance (ICSM '98), pp. 190-198, Nov. 1998.
[11] H. Gall , M. Jazayeri , R. Klösch , and G. Trausmuth , “Software Evolution Observations Based on Product Release History,” Proc. Int'l Conf. Software Maintenance (ICSM '97), pp. 160-196, 1997.
[12] H. Gall , M. Jazayeri , and J. Krajewski , “CVS Release History Data for Detecting Logical Couplings,” Proc. Int'l Workshop Principles of Software Evolution, pp. 13-23, 2003.
[13] C. Görg and P. Weißgerber , “Detecting and Visualizing Refactorings from Software Archives,” Proc. 13th Int'l Workshop Program Comprehension (IWPC), 2005.
[14] T.L. Graves , A.F. Karr , J.S. Marron , and H. Siy , “Predicting Fault Incidence Using Software Change History,” IEEE Trans. Software Eng., vol. 26, no. 7, July 2000.
[15] A.E. Hassan and R.C. Holt , “The Chaos of Software Development,” Proc. Int'l Workshop Principles of Software Evolution, 2003.
[16] A.E. Hassan and R.C. Holt , “Predicting Change Propagation in Software Systems,” Proc. Int'l Conf. Software Maintenance (ICSM 2004), Sept. 2004.
[17] Proc. 25th Int'l Conf. Software Eng. (ICSE), May 2003.
[18] Proc. Int'l Conf. Software Maintenance (ICSM 2001), Nov. 2001.
[19] Proc. Int'l Workshop Principles of Software Evolution (IWPSE 2003), Sept. 2003.
[20] A. Michail , “Data Mining Library Reuse Patterns in User-Selected Applications,” Proc. 14th Int'l Conf. Automated Software Eng. (ASE '99), pp. 24-33, Oct. 1999.
[21] A. Michail , “Data Mining Library Reuse Patterns Using Generalized Association Rules,” Proc. Int'l Conf. Software Eng., pp. 167-176, 2000.
[22] A. Mockus and L.G. Votta , “Identifying Reasons for Software Changes Using Historic Databases,” Proc. Int'l Conf. Software Maintenance (ICSM 2000), pp. 120-130, Oct. 2000.
[23] A. Mockus , D.M. Weiss , and P. Zhang , “Understanding and Predicting Effort in Software Projects,” Proc. Int'l Conf. Software Eng., pp. 274-284, 2003.
[24] Proc. Int'l Workshop Mining Software Repositories (MSR 2004), May 2004.
[25] C.J.V. Rijsbergen , Information Retrieval, second ed. London: Butterworths, 1979.
[26] J. Sayyad-Shirabad , T.C. Lethbridge , and S. Matwin , “Supporting Maintainance of Legacy Software with Data Mining Techniques,” Proc. Int'l Conf. Software Methods, pp. 22-31, 2001.
[27] J. Sayyad-Shirabad , T.C. Lethbridge , and S. Matwin , “Mining the Maintenance History of a Legacy Software System,” Proc. Int'l Conf. Software Maintenance (ICSM 2003), Sept. 2003.
[28] J. Sayyad-Shirabad , T.C. Lethbridge , and S. Matwin , “Mining the Software Change Repository of a Legacy Telephony System,” Proc. Int'l Workshop Mining Software Repositories (MSR 2004), pp. 53-57, 2004.
[29] J. Sliwerski , T. Zimmermann , and A. Zeller , “When Do Changes Induce Fixes? On Fridays,” Proc. Int'l Workshop Mining Software Repositories (MSR), May 2005.
[30] R. Srikant and R. Agrawal , “Mining Generalized Association Rules,” Proc. 21st Very Large Data Bases Conf. (VLDB), pp. 407-419, 1995.
[31] R. Srikant , Q. Vu , and R. Agrawal , “Mining Association Rules with Item Constraints,” Proc. Third Int'l Conf. KDD and Data Mining (KDD '97), Aug. 1997.
[32] Z. Xing and E. Stroulia , “Data-Mining in Support of Detecting Class Co-Evolution,” Proc. 16th Int'l Conf. Software Eng. and Knowledge Eng. (SEKE '04), June 2004.
[33] A.T. Ying , G.C. Murphy , R. Ng , and M.C. Chu-Carroll , “Predicting Source Code Changes by Mining Change History,” IEEE Trans. Software Eng., vol. 30, no. 9, pp. 574-586, Sept. 2004.
[34] T. Zimmermann , “Mining Version Archives to Guide Software Changes,” Master's thesis, Universität Passau, Germany, June 2004.
[35] T. Zimmermann , S. Diehl , and A. Zeller , “How History Justifies System Architecture (or Not),” Proc. Int'l Workshop Principles on Software Evolution, pp. 73-83, 2003.
[36] T. Zimmermann and P. Weißgerber , “Preprocessing CVS Data For Fine-Grained Analysis,” Proc. Mining Software Repositories, pp. 2-6, 2004.

Index Terms:
Index Terms- Programming environments/construction tools, distribution, maintenance, enhancement, configuration management, clustering, classification, association rules, data mining.
Citation:
Thomas Zimmermann, Peter Weißgerber, Stephan Diehl, Andreas Zeller, "Mining Version Histories to Guide Software Changes," IEEE Transactions on Software Engineering, vol. 31, no. 6, pp. 429-445, June 2005, doi:10.1109/TSE.2005.72
Usage of this product signifies your acceptance of the Terms of Use.