2012 16th Panhellenic Conference on Informatics (2011)
Kastoria, Greece
Sept. 30, 2011 to Oct. 2, 2011
ISBN: 978-0-7695-4389-5
pp: 245-249
In this paper we review and compare focused crawling strategies, studied and published during the past decade. Despite giant leaps in communication, storage and computing power in recent years, crawlers have always struggled to keep up with Web content generation and modification. Focused crawlers attempt to i) accelerate the crawling process, ii) maximize the harvest of high quality pages, iii) assign appropriate credit to different documents along a crawling path, such that short-term gains are not pursued at the expense of less obvious paths that ultimately yield larger sets of valuable pages. Beyond the review and comparison of the focused crawling strategies, we additionally propose additions to the corresponding architectures for further research.
Focused crawling, adaptive crawling, location-based Web search, context graphs
