Issue No.10 - October (2009 vol.20)
Steve Webb , Purewire, Atlanta
James Caverlee , Texas A&M University, College Station
William B. Rouse , Georgia Institute of Technology, Atlanta
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2008.227
Link-based analysis of the Web provides the basis for many important applications—like Web search, Web-based data mining, and Web page categorization—that bring order to the massive amount of distributed Web content. Due to the overwhelming reliance on these important applications, there is a rise in efforts to manipulate (or spam) the link structure of the Web. In this manuscript, we present a parameterized framework for link analysis of the Web that promotes spam resilience through a source-centric view of the Web. We provide a rigorous study of the set of critical parameters that can impact source-centric link analysis and propose the novel notion of influence throttling for countering the influence of link-based manipulation. Through formal analysis and a large-scale experimental study, we show how different parameter settings may impact the time complexity, stability, and spam resilience of Web link analysis. Concretely, we find that the source-centric model supports more effective and robust rankings in comparison with existing Web algorithms such as PageRank.
Internet search, information search and retrieval, information storage and retrieval, information technology and systems, distributed systems, systems and software, Web search, general, Web-based services, online information services.
Steve Webb, James Caverlee, William B. Rouse, "A Parameterized Approach to Spam-Resilient Link Analysis of the Web", IEEE Transactions on Parallel & Distributed Systems, vol.20, no. 10, pp. 1422-1438, October 2009, doi:10.1109/TPDS.2008.227