16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008) Bulk-Synchronous On-Line Crawling on Clusters of Computers February 13-February 15 ISBN: 978-0-7695-3089-5
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PDP.2008.84
This paper describes the design of a crawler devised to perform the periodic retrieval of Web documents for a search engine able to accept on-line updates in a concurrent manner. On-line updates comes in the form of insertions of new documents or update of existing ones, all of them mixed with the usual user queries.The search engine is bulk-synchronous which allows it to deal efficiently with the concurrency control problem. The crawler is also bulk-synchronous so that it can be integrated into the same $P$-processors cluster executing the search engine. This paper describes and evaluates the practical feasibility of such a crawler.
Index Terms:
parallel computing, Web crawling
Citation:
Mauricio Marin, Carolina Bonacic, "Bulk-Synchronous On-Line Crawling on Clusters of Computers," pdp, pp.414-421, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008), 2008 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||