2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC) (2016)
Atlanta, GA, USA
June 10, 2016 to June 14, 2016
ISSN: 0730-3157
ISBN: 978-1-4673-8846-7
pp: 582-591
Thanks to the strength of crowdsourcing, there is a lot of useful information on StackOverflow, the most popular Question and Answer (Q&A) platform in software engineering area. This information can be treated as numerous URLs (Uniform Resource Locators), which can be categorized into URLs of Q&As and URLs in Q&As. The domain of former ones is Stack-Overflow itself, while domains of latter ones are miscellaneous, such as some personal blogs and so on. Although each Q&A has been manually assigned tags, relations between URLs and tags are not clear enough. In this paper, we propose SOLinker, a method to build semantic links between various URLs and tags. Firstly, SOLinker identifies proper relations from a predefined relation set between tags and URLs, which is modeled as a text classification problem. Features are extracted from content of Q&A, the URL and the tag list, and classification algorithms are Logistic Regression and Gradient Boosting Decision Tree, depending on the category of URLs. Secondly, there exists a partial tagging problem, which means for a URL in a Q&A, there are only a part of tags of the Q&A relating to the URL. To address this problem, we propose a semantic analysis method to analyze context of this URL and the URL itself from both implicit and explicit aspects. Then SOLinker will infer proper tags by the label propagation technique. Results show that our method is feasible and practical in constructing semantic links between tags and URLs of/in Q&As. In particular, the F-Score of semantic relation identification is around 78%, 5% higher than the other existing method, and F-Score of partial tagging solving is around 88%.
Semantics, Uniform resource locators, Feature extraction, Tagging, Context, Couplings, Data mining

