The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - Sept.-Oct. (2012 vol.38)
pp: 1069-1087
Collin McMillan , College of William and Mary, Williamsburg
Mark Grechanik , Accenture Technology Labs, Chicago
Denys Poshyvanyk , College of William and Mary, Williamsburg
Chen Fu , Accenture Technology Labs, Chicago
Qing Xie , Accenture Technology Labs, Chicago
ABSTRACT
A fundamental problem of finding software applications that are highly relevant to development tasks is the mismatch between the high-level intent reflected in the descriptions of these tasks and low-level implementation details of applications. To reduce this mismatch we created an approach called EXEcutable exaMPLes ARchive (Exemplar) for finding highly relevant software projects from large archives of applications. After a programmer enters a natural-language query that contains high-level concepts (e.g., MIME, datasets), Exemplar retrieves applications that implement these concepts. Exemplar ranks applications in three ways. First, we consider the descriptions of applications. Second, we examine the Application Programming Interface (API) calls used by applications. Third, we analyze the dataflow among those API calls. We performed two case studies (with professional and student developers) to evaluate how these three rankings contribute to the quality of the search results from Exemplar. The results of our studies show that the combined ranking of application descriptions and API documents yields the most-relevant search results. We released Exemplar and our case study data to the public.
INDEX TERMS
Search engines, Engines, Software, Java, Cryptography, Vocabulary, Data mining, software reuse, Source code search engines, information retrieval, concept location, open source software, mining software repositories
CITATION
Collin McMillan, Mark Grechanik, Denys Poshyvanyk, Chen Fu, Qing Xie, "Exemplar: A Source Code Search Engine for Finding Highly Relevant Applications", IEEE Transactions on Software Engineering, vol.38, no. 5, pp. 1069-1087, Sept.-Oct. 2012, doi:10.1109/TSE.2011.84
REFERENCES
[1] A. Al-Maskari, M. Sanderson, and P. Clough, "The Relationship between IR Effectiveness Measures and User Satisfaction," Proc. 30th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 773-774, 2007.
[2] N. Anquetil and T.C. Lethbridge, "Assessing the Relevance of Identifier Names in a Legacy Software System," Proc. Conf. Centre for Advanced Studies on Collaborative Research, p. 4, 1998.
[3] S. Bajracharya and C. Lopes, "Analyzing and Mining an Internet-Scale Code Search Engine Usage Log," J. Empirical Software Eng., 2009.
[4] S. Bajracharya, J. Ossher, and C. Lopes, "Searching API Usage Examples in Code Repositories with Sourcerer API Search," Proc. ICSE Workshop Search-Driven Development: Users, Infrastructure, Tools and Evaluation, pp. 5-8, 2010.
[5] S.K. Bajracharya, J. Ossher, and C.V. Lopes, "Leveraging Usage Similarity for Effective Retrieval of Examples in Code Repositories," Proc. 18th ACM SIGSOFT Int'l Symp. Foundations of Software Eng., pp. 157-166, 2010.
[6] T.J. Biggerstaff, B.G. Mitbander, and D.E. Webster, "Program Understanding and the Concept Assigment Problem," Comm. ACM, vol. 37, no. 5 pp. 72-82, 1994.
[7] J. Brandt, M. Dontcheva, M. Weskamp, and S.R. Klemmer, "Example-Centric Programming: Integrating Web Search into the Development Environment," Proc. 28th Int'l Conf. Human Factors in Computing Systems, pp. 513-522, 2010.
[8] S. Chatterjee, S. Juvekar, and K. Sen, "SNIFF: A Search Engine for Java Using Free-Form Queries," Proc. Int'l Conf. Fundamental Approaches to Software Eng., pp. 385-400, 2009.
[9] D. Cubranic, G.C. Murphy, J. Singer, and K.S. Booth, "Hipikat: A Project Memory for Software Development," IEEE Trans. Software Eng. vol. 31, no. 6, pp. 446-465, June 2005.
[10] U. Dekel and J.D. Herbsleb, "Improving API Documentation Usability with Knowledge Pushing," Proc. 31st IEEE Int'l Conf. Software Eng., pp. 320-330, 2009.
[11] G.W. Furnas, T.K. Landauer, L.M. Gomez, and S.T. Dumais, "The Vocabulary Problem in Human-System Communication," Comm. ACM, vol. 30, no. 11 pp. 964-971, 1987.
[12] M. Gabel and Z. Su, "A Study of the Uniqueness of Source Code," Proc. 18th ACM SIGSOFT Int'l Symp. Foundations of Software Eng., pp. 147-156, 2010.
[13] L.A. Granka, T. Joachims, and G. Gay, "Eye-Tracking Analysis of User Behavior in WWW Search," Proc. 27th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 478-479, 2004.
[14] M. Grechanik, K.M. Conroy, and K. Probst, "Finding Relevant Applications for Prototyping," Proc. Fourth Int'l Symp. Mining Software Repositories, p. 12, 2007.
[15] M. Grechanik, C. Fu, Q. Xie, C. McMillan, D. Poshyvanyk, and C.M. Cumby, "A Search Engine for Finding Highly Relevant Applications," Proc. 32nd ACM/IEEE Int'l Conf. Software Eng., pp. 475-484, 2010.
[16] S. Henninger, "Supporting the Construction and Evolution of Component Repositories," Proc. 18th Int'l Conf. Software Eng., pp. 279-288, 1996.
[17] R. Hill and J. Rideout, "Automatic Method Completion," Proc. IEEE 19th Int'l Conf. Automated Software Eng., pp. 228-235, 2004.
[18] R. Holmes and G.C. Murphy, "Using Structural Context to Recommend Source Code Examples," Proc. 27th Int'l Conf. Software Eng., pp. 117-125, 2005.
[19] R. Holmes, R.J. Walker, and G.C. Murphy, "Strathcona Example Recommendation Tool," Proc. 10th European Software Eng. Conf. Held Jointly with 13th ACM SIGSOFT Int'l Symp. Foundations of Software Eng., pp. 237-240, 2005.
[20] R. Holmes, R.J. Walker, and G.C. Murphy, "Approximate Structural Context Matching: An Approach to Recommend Relevant Examples," IEEE Trans. Software Eng. vol. 32, no. 12, pp. 952-970, Dec. 2006.
[21] J. Howison and K. Crowston, "The Perils and Pitfalls of Mining Sourceforge," Proc. Workshop Mining Software Repositories at the Int'l Conf. Software Eng., 2004.
[22] K. Inoue, R. Yokomori, H. Fujiwara, T. Yamamoto, M. Matsushita, and S. Kusumoto, "Component Rank: Relative Significance Rank for Software Component Search," Proc. 25th Int'l Conf. Software Eng., pp. 14-24, 2003.
[23] K. Inoue, R. Yokomori, T. Yamamoto, M. Matsushita, and S. Kusumoto, "Ranking Significance of Software Components Based on Use Relations," IEEE Trans. Software Eng. vol. 31, no. 3 pp. 213-225, Mar. 2005.
[24] I.T. Jolliffe, Principal Component Analysis. Springer Verlag, 1986.
[25] C.W. Krueger, "Software Reuse," ACM Computing Surveys, vol. 24, no. 2, pp. 131-183, 1992.
[26] O.A.L. Lemos, S. Bajracharya, J. Ossher, P.C. Masiero, and C. Lopes, "Applying Test-Driven Code Search to the Reuse of Auxiliary Functionality," Proc. Symp. Applied Computing, pp. 476-482, 2009.
[27] E. Linstead, S. Bajracharya, T. Ngo, P. Rigor, C. Lopes, and P. Baldi, "Sourcerer: Mining and Searching Internet-Scale Software Repositories," Data Mining and Knowledge Discovery, vol. 18, pp. 300-336, 2009.
[28] G. Little and R.C. Miller, "Keyword Programming in Java," Proc. 22nd IEEE/ACM Int'l Conf. Automated Software Eng., pp. 84-93, 2007.
[29] D. Mandelin, L. Xu, R. Bodik, and D. Kimelman, "Jungloid Mining: Helping to Navigate the API Jungle," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, pp. 48-61, 2005.
[30] C.D. Manning, P. Raghavan, and H. Schtze, Introduction to Information Retrieval. Cambridge Univ. Press, 2008.
[31] S.S. Muchnick, Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers Inc., 1997.
[32] G.C. Murphy, D. Notkin, and K.J. Sullivan, "Software Reflexion Models: Bridging the Gap between Source and High-Level Models," Proc. ACM SIGSOFT Symp. Foundations of Software Eng., pp. 18-28, 1995.
[33] G.C. Murphy, D. Notkin, and K.J. Sullivan, "Software Reflexion Models: Bridging the Gap between Design and Implementation," IEEE Trans. Software Eng. vol. 27, no. 4, pp. 364-380, Apr. 2001.
[34] D. Poshyvanyk, Y.-G. Gueheneuc, A. Marcus, G. Antoniol, and V. Rajlich, "Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval," IEEE Trans. Software Eng., vol. 33, no. 6 pp. 420-432, June 2007.
[35] D. Poshyvanyk and M. Grechanik, "Creating and Evolving Software by Searching, Selecting and Synthesizing Relevant Source Code," Proc. 31st Int'l Conf. Software Eng.—Companion Vol., pp. 283-286, 2009.
[36] S.P. Reiss, "Semantics-Based Code Search," Proc. IEEE 31st Int'l Conf. Software Eng., pp. 243-253, 2009.
[37] M.P. Robillard, "Automatic Generation of Suggestions for Program Investigation," Proc. 10th European Software Eng. Conf. Held Jointly with 13th ACM SIGSOFT Int'l Symp. Foundations of Software Eng., pp. 11-20, 2005.
[38] M.P. Robillard, "Topology Analysis of Software Dependencies," ACM Trans. Software Eng. Methodology, vol. 17, no. 4, pp. 1-36, 2008.
[39] N. Sahavechaphan and K.T. Claypool, "XSnippet: Mining for Sample Code," Proc. ACM SIGPLAN Int'l Conf. Object-Oriented Programming Systems, Languages, and Applications, pp. 413-430, 2006.
[40] G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, 1989.
[41] Z.M. Saul, V. Filkov, P. Devanbu, and C. Bird, "Recommending Random Walks," Proc. Sixth Joint Meeting of the European Software Eng. Conf. and the ACM SIGSOFT Symp. Foundations of Software Eng., pp. 15-24, 2007.
[42] S.E. Sim, M. Umarji, S. Ratanotayanon, and C. Lopes, "How Well Do Internet Code Search Engines Support Open Source Reuse Strategies?" ACM Trans. Software Eng. and Methodologies, 2009.
[43] R. Mark Sirkin, Statistics for the Social Sciences, third ed. Sage Publications, Aug. 2005.
[44] M.D. Smucker, J. Allan, and B. Carterette, "A Comparison of Statistical Significance Tests for Information Retrieval Evaluation," Proc. 16 ACM Conf. Information and Knowledge Management, pp. 623-632, 2007.
[45] J. Stylos and B.A. Myers, "A Web-Search Tool for Finding API Components and Examples," Proc. IEEE Symp. Visual Languages and Human Centric Computing, pp. 195-202, 2006.
[46] N. Tansalarak and K.T. Claypool, "Finding a Needle in the Haystack: A Technique for Ranking Matches Between Components," Proc. Int'l Symp. Component-Based Software Eng., pp. 171-186, 2005.
[47] S. Thummalapenta and T. Xie, "Spotweb: Detecting Framework Hotspots and Coldspots via Mining Open Source Code on the Web," Proc. 23rd IEEE/ACM Int'l Conf. Automated Software Eng., pp. 327-336, 2008.
[48] S. Thummalapenta and T. Xie, "Parseweb: A Programmer Assistant for Reusing Open Source Code on the Web," Proc. IEEE/ACM 22nd Int'l Conf. Automated Software Eng., pp. 204-213, 2007.
[49] I.H. Witten, A. Moffat, and T.C. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images, second ed. Morgan Kaufmann, 1999.
[50] Y. Ye and G. Fischer, "Supporting Reuse by Delivering Task-Relevant and Personalized Information," Proc. 24th Int'l Conf. Software Eng., pp. 513-523, 2002.
[51] Y. Ye and G. Fischer, "Reuse-Conducive Development Environments," Automated Software Eng. vol. 12, pp. 199-235, Apr. 2005.
[52] R. Yokomori, H. Siy, M. Noro, and K. Inoue, "Assessing the Impact of Framework Changes Using Component Ranking," Proc. IEEE Int'l Conf. Software Maintenance, pp. 189-198, 2009.
20 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool