Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007)
Improvement of PageRank for Focused Crawler
Haier International Training Center, Qingdao, China
July 30-August 01
ISBN: 0-7695-2909-7
The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers. Focused crawler is developed to collect relevant web pages of interested topics form the Internet. The PageRank algorithm is used in ranking web pages. It estimates the page?s authority by taking into account the link structure of the Web. However, it assigns each outlink the same weight and is independent of topics, resulting in topic-drift. In this paper, we proposed an improved PageRank algorithm, which we called "T-PageRank", and it based on "topical random surfer". The experiment in focused crawler using the T-PageRank has better performance than the Breath-first and PageRank algorithms.
Index Terms:
focused crawler, PageRank, topical random surfer, T-PageRank
Citation:
Fuyong Yuan, Chunxia Yin, Jian Liu, "Improvement of PageRank for Focused Crawler," snpd, vol. 2, pp.797-802, Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007), 2007