loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008)
Bulk-Synchronous On-Line Crawling on Clusters of Computers
February 13-February 15
ISBN: 978-0-7695-3089-5
This paper describes the design of a crawler devised to perform the periodic retrieval of Web documents for a search engine able to accept on-line updates in a concurrent manner. On-line updates comes in the form of insertions of new documents or update of existing ones, all of them mixed with the usual user queries.The search engine is bulk-synchronous which allows it to deal efficiently with the concurrency control problem. The crawler is also bulk-synchronous so that it can be integrated into the same $P$-processors cluster executing the search engine. This paper describes and evaluates the practical feasibility of such a crawler.
Index Terms:
parallel computing, Web crawling
Citation:
Mauricio Marin, Carolina Bonacic, "Bulk-Synchronous On-Line Crawling on Clusters of Computers," pdp, pp.414-421, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008), 2008
Usage of this product signifies your acceptance of the Terms of Use.