DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MIS.2004.62
A focused crawler is an agent that concentrates on a particular target topic and tries to visit and gather only relevant pages from the Web. A crucial issue for a focused crawler is the underlying heuristic for deciding the page to visit next. The authors propose a rule-based approach to improve a baseline focused crawler's harvest rate and coverage. The baseline focused crawler employs a canonical topic taxonomy to train a na?ve-Bayesian classifier, which then helps score unseen URLs. The authors explore using simple rules derived from interclass (topic) linkage patterns to decide the crawler's next move. The rule-based approach also enhances the baseline crawler in supporting tunneling. In initial performance results, the rule-based crawler improved the harvest rate and coverage of the baseline crawler.
Index Terms:
focused Web crawling, tunneling, rule extraction, Web mining, na?ve Bayesian classification
Citation:
Ismail Seng? Alting?vde, ?zg? Ulusoy, "Exploiting Interclass Rules for Focused Crawling," IEEE Intelligent Systems, vol. 19, no. 6, pp. 66-73, Nov./Dec. 2004, doi:10.1109/MIS.2004.62 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||