The Community for Technology Leaders
2013 IEEE International Conference on Software Maintenance (2013)
Eindhoven Netherlands
Sept. 22, 2013 to Sept. 28, 2013
ISSN: 1063-6773
pp: 240-249
Tao Wang , Nat. Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China
Huaimin Wang , Nat. Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China
Gang Yin , Nat. Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China
Charles X. Ling , Dept. of Comput. Sci., Univ. of Western Ontario, London, ON, Canada
Xiang Li , Nat. Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China
Peng Zou , Acad. of Equip., Beijing, China
ABSTRACT
The large amounts of software repositories over the Internet are fundamentally changing the traditional paradigms of software maintenance. Efficient categorization of the massive projects for retrieving the relevant software in these repositories is of vital importance for Internet-based maintenance tasks such as solution searching, best practices learning and so on. Many previous works have been conducted on software categorization by mining source code or byte code, which are only verified on relatively small collections of projects with coarse-grained categories or clusters. However, Internet-based software maintenance requires finer-grained, more scalable and language-independent categorization approaches. In this paper, we propose a novel approach to hierarchically categorize software projects based on their online profiles across multiple repositories. We design a SVM-based categorization framework to classify the massive number of software hierarchically. To improve the categorization performance, we aggregate different types of profile attributes from multiple repositories and design a weighted combination strategy which assigns greater weights to more important attributes. Extensive experiments are carried out on more than 18,000 projects across three repositories. The results show that our approach achieves significant improvements by using weighted combination, and the overall precision, recall and F-Measure can reach 71.41%, 65.60% and 68.38% in appropriate settings. Compared to the previous work, our approach presents competitive results with 123 finer-grained and multi-layered categories. In contrast to those using source code or byte code, our approach is more effective for large-scale and language-independent software categorization.
INDEX TERMS
Databases, Collaboration, Servers, Software maintenance, Internet, Data mining,Hierarchical Categorization, Software Repository, Software Profile
CITATION
Tao Wang, Huaimin Wang, Gang Yin, Charles X. Ling, Xiang Li, Peng Zou, "Mining Software Profile across Multiple Repositories for Hierarchical Categorization", 2013 IEEE International Conference on Software Maintenance, vol. 00, no. , pp. 240-249, 2013, doi:10.1109/ICSM.2013.35
181 ms
(Ver 3.3 (11022016))