This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Exploiting Interclass Rules for Focused Crawling
November/December 2004 (vol. 19 no. 6)
pp. 66-73
Ismail Seng? Alting?vde, Bilkent University
?zg? Ulusoy, Bilkent University
A focused crawler is an agent that concentrates on a particular target topic and tries to visit and gather only relevant pages from the Web. A crucial issue for a focused crawler is the underlying heuristic for deciding the page to visit next. The authors propose a rule-based approach to improve a baseline focused crawler's harvest rate and coverage. The baseline focused crawler employs a canonical topic taxonomy to train a na?ve-Bayesian classifier, which then helps score unseen URLs. The authors explore using simple rules derived from interclass (topic) linkage patterns to decide the crawler's next move. The rule-based approach also enhances the baseline crawler in supporting tunneling. In initial performance results, the rule-based crawler improved the harvest rate and coverage of the baseline crawler.
Index Terms:
focused Web crawling, tunneling, rule extraction, Web mining, na?ve Bayesian classification
Citation:
Ismail Seng? Alting?vde, ?zg? Ulusoy, "Exploiting Interclass Rules for Focused Crawling," IEEE Intelligent Systems, vol. 19, no. 6, pp. 66-73, Nov.-Dec. 2004, doi:10.1109/MIS.2004.62
Usage of this product signifies your acceptance of the Terms of Use.