2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs (MCSoC) (2014)
Sept. 23, 2014 to Sept. 25, 2014
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MCSoC.2014.10
Numerous critical Internet applications with high-quality services, such as Web directory, search engine, Web crawler, recommendation system and user profile detector, etc. Almost depend on the efficient and accurate of web page classification system. Traditional supervised or semi-supervised machine learning methods become more and more difficult to adapt to the explosive Internet information. This paper proposed a web page classification method based on the topological structure of Wikipedia knowledge network. The kinship-relation association based on content similarity was proposed to solve the unbalance problem when a category node inherited the probability from multiple fathers. We used N-gram based on Wikipedia words to extract the keywords from web page, and introduce Bayes classifier to estimate the page class probability. Experimental results shown that the proposed method has very good scalability, robustness and reliability for different web pages.
Internet, Encyclopedias, Electronic publishing, Web pages, Knowledge based systems, Benchmark testing
H. Li et al., "An Information Classification Approach Based on Knowledge Network," 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs (MCSoC)(MCSOC), Aizu-Wakamatsu, Japan, 2014, pp. 3-8.