The Community for Technology Leaders
2013 10th Working Conference on Mining Software Repositories (MSR) (2009)
Vancouver, BC, Canada
May 16, 2009 to May 17, 2009
ISBN: 978-1-4244-3493-0
pp: 71-80
Emily Hill , Department of Computer and Information Sciences, University of Delaware, Newark, 19716 USA
K. Vijay-Shanker , Department of Computer and Information Sciences, University of Delaware, Newark, 19716 USA
Eric Enslen , Department of Computer and Information Sciences, University of Delaware, Newark, 19716 USA
Lori Pollock , Department of Computer and Information Sciences, University of Delaware, Newark, 19716 USA
ABSTRACT
Automated software engineering tools (e.g., program search, concern location, code reuse, quality assessment, etc.) increasingly rely on natural language information from comments and identifiers in code. The first step in analyzing words from identifiers requires splitting identifiers into their constituent words. Unlike natural languages, where space and punctuation are used to delineate words, identifiers cannot contain spaces. One common way to split identifiers is to follow programming language naming conventions. For example, Java programmers often use camel case, where words are delineated by uppercase letters or non-alphabetic characters. However, programmers also create identifiers by concatenating sequences of words together with no discernible delineation, which poses challenges to automatic identifier splitting. In this paper, we present an algorithm to automatically split identifiers into sequences of words by mining word frequencies in source code. With these word frequencies, our identifier splitter uses a scoring technique to automatically select the most appropriate partitioning for an identifier. In an evaluation of over 8000 identifiers from open source Java programs, our Samurai approach outperforms the existing state of the art techniques.
INDEX TERMS
CITATION
Emily Hill, K. Vijay-Shanker, Eric Enslen, Lori Pollock, "Mining source code to automatically split identifiers for software analysis", 2013 10th Working Conference on Mining Software Repositories (MSR), vol. 00, no. , pp. 71-80, 2009, doi:10.1109/MSR.2009.5069482
82 ms
(Ver 3.3 (11022016))