loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
11th International Database Engineering and Applications Symposium (IDEAS 2007)
CINDI Robot: an Intelligent Web Crawler Based on Multi-level Inspection
Banff, Alberta, Canada
September 06-September 08
ISBN: 0-7695-2947-X
Rui Chen, Concordia University, Canada
Bipin C. Desai, Concordia University, Canada
Cong Zhou, Motorola Canada
With the explosion of the Web, focused web crawlers are gaining attention. Focused web crawlers aim at finding web pages related to the pre-defined topic. CINDI Robot is a focused web crawler devoted to finding computer science and software engineering academic documents. We propose a multi-level inspection scheme to discover relevant web pages. Through this multi-level inspection scheme, the text feature of the content contributes to the classification; furthermore other web characteristics, such as URL pattern, anchor text and so on, assist the decision process. The experiment result demonstrates this multi-level inspection method outperforms other traditional methods.
Index Terms:
focused web crawler, SVM classifier, Na?ve Bayes classifier, multi-level inspection, revised context graph, tunneling
Citation:
Rui Chen, Bipin C. Desai, Cong Zhou, "CINDI Robot: an Intelligent Web Crawler Based on Multi-level Inspection," ideas, pp.93-101, 11th International Database Engineering and Applications Symposium (IDEAS 2007), 2007
Usage of this product signifies your acceptance of the Terms of Use.