loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI'06)
A Memory-Efficient Strategy for Exploring the Web
Hong Kong, China
December 18-December 22
ISBN: 0-7695-2747-7
Carlos Castillo, Universita di Roma "La Sapienza", Italy
Alberto Nelli, Universita di Roma "La Sapienza", Italy
Alessandro Panconesi, Universita di Roma "La Sapienza", Italy
Search engines rely on Web crawlers to create an index of the Web. Web crawlers explore the Web downloading pages and finding links to new pages to be explored. At any given moment, there are a number of pages waiting to be downloaded in the crawler queue. We study the growth of this queue of pending pages during a crawl of a large subset of the Web. In a normal breadth-first crawler, the queue quickly grows very large.

We present a strategy for managing the pending queue that reduces its maximum size by 50% while preserving the coverage and quality of the pages visited. This can be applied to general purpose Web crawlers as well as topic-specific crawling, peer-to-peer search, on-demand Web crawling, and other environments in which memory usage has to be kept to a minimum.

Citation:
Carlos Castillo, Alberto Nelli, Alessandro Panconesi, "A Memory-Efficient Strategy for Exploring the Web," wi, pp.680-686, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.