loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fourth Annual ACIS International Conference on Computer and Information Science (ICIS'05)
AuToCrawler: An Integrated System for Automatic Topical Crawler
Jeju Island, South Korea
July 14-July 16
ISBN: 0-7695-2296-3
Jyh-Jong Tsay, National Chung Cheng University
Chen-Yang Shih, National Chung Cheng University
Bo-Liang Wu, National Chung Cheng University
A topical (or focused) crawler is a web crawler aiming to search and retrieve web pages from the World Wide Web, which are related to a specific topic. Rather than downloading all accessible Web pages, a topical crawler analyzes the frontier of the crawled region to visit only the portion of the web that contains relevant web pages, and at the same time, try to skip irrelevant regions. This leads to significant savings in both computation and communication resources. In this paper,we present an integrated topical crawler: AuToCrawler. The main features of AuToCrawler consist of a user interest specification module that mediates between users and search engines to identify target examples and keywords that together specify the topic of their interest, and a URL ordering strategy that combines features of several previous approaches and achieves significant improvement. It also provides a graphic user interface such that users can evaluate and visualize the crawling results that can be used as feedback to reconfigure the crawler.
Index Terms:
Topical Crawler, Focused Crawler, Search Engines, Information Retrieval, Machine Learning
Citation:
Jyh-Jong Tsay, Chen-Yang Shih, Bo-Liang Wu, "AuToCrawler: An Integrated System for Automatic Topical Crawler," icis, pp.462-467, Fourth Annual ACIS International Conference on Computer and Information Science (ICIS'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.