Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on (2011)
Aug. 22, 2011 to Aug. 27, 2011
In this paper, we present a Web spam detection algorithm that relies on link analysis. The method consists of three steps: (1) decomposition of web graphs in densely connected sub graphs and calculation of the features for each sub graph, (2) use of SVM classifiers to identify sub graphs composed of Web spam, and (3) propagation of predictions over web graphs by a biased Page Rank algorithm to expand the scope of identification. We performed experiments on a public benchmark. An empirical study of the core structure of web graphs suggests that highly ranked non-spam hosts can be identified by viewing the coreness of the web graph elements.
Web spam, dense subgraphs, biased pagerank
K. Inui, Y. Kidawara, Y. I. Leon-Suematsu and S. Kurohashi, "Web Spam Detection by Exploring Densely Connected Subgraphs," 2011 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies(WI-IAT), Lyon, 2011, pp. 124-129.