2012 International Conference on Cloud and Service Computing (2013)
Beijing, China China
Nov. 4, 2013 to Nov. 6, 2013
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CSC.2013.19
Today the academic ranking for computer science is a hot and importmant problem. This paper introduces Computer Science Academic Rankings System (CSAR) which aims at academic information extracting, mining and ranking. In this paper we mainly present approaches for information extraction and normalization in CSAR. For semi-structured and unstructured web pages such as paper-view pages, we propose a method with natural language processing n-gram model and web grammar. We generate an optimal matching bipartite graph to extract authors and organizations information with maximum likelihood. CSAR also uses KM algorithm and Hungarian algorithm to find authors and emails correspondence. For information normalization, we introduce n-gram model, EM algorithm and trigram model with linear interpolation to construct part-of-speech tagger, with which to extract useful information from web source. Then TF-IDF model and string edit distance are applied to finish normalizing organization names. In experiment, our proposed approaches obtain high accuracy rate and great improvements of academic information extraction.
Chengkai Shi, Jiahui Quan, Minglu Li, "Information Extraction for Computer Science Academic Rankings System", 2012 International Conference on Cloud and Service Computing, vol. 00, no. , pp. 69-76, 2013, doi:10.1109/CSC.2013.19