loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
First Latin American Web Congress (LA-WEB'03)
Cooperative Crawling
Santiago, Chile
November 10-November 12
ISBN: 0-7695-2058-8
Marina Buzzi, IIT-CNR
Web crawler design presents many different challenges: architecture, strategies, performance and more. One of the most important research topics concerns improving the selection of "interesting" web pages (for the user), according to importance metrics. Another relevant point is content freshness, i.e. maintaining freshness and consistency of temporary stored copies. For this, the crawler periodically repeats its activity going over stored contents (re-crawling process). In this paper, we propose a scheme to permit a crawler to acquire information about the global state of a website before the crawling process takes place. This scheme requires web server cooperation in order to collect and publish information on its content, useful for enabling a crawler to tune its visit strategy. If this information is unavailable or not updated the crawler still acts in the usual manner. In this sense the proposed scheme is not invasive and is independent from any crawling strategy and architecture.
Citation:
Marina Buzzi, "Cooperative Crawling," la-web, pp.209, First Latin American Web Congress (LA-WEB'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.