This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2010 Third International Symposium on Intelligent Information Technology and Security Informatics
A Focused Crawler Based on Naive Bayes Classifier
Jinggangshan, China
April 02-April 04
ISBN: 978-0-7695-4020-7
The exponential growth of information on the World Wide Web makes it increasingly difficult to discover relevant data about a specific topic. In this case, growing interest is emerging in focused crawler, a program that traverses the Internet by choosing relevant pages to a predefined topic and neglecting those out of concern. A new focused crawler based on Naive Bayes classifier was proposed here, which used an improved TF-IDF algorithm to extract the characteristics of page content and adopted Bayes classifier to compute the page rank. Then the crawler developed was compared with a BFS crawler and a PageRank crawler, and the results show that our crawler has better performance than the PageRank crawler and BFS crawler in harvest ratio.
Index Terms:
Focused Crawler, Naive Bayes, Classifier, TF-IDF
Citation:
Wenxian Wang, Xingshu Chen, Yongbin Zou, Haizhou Wang, Zongkun Dai, "A Focused Crawler Based on Naive Bayes Classifier," iitsi, pp.517-521, 2010 Third International Symposium on Intelligent Information Technology and Security Informatics, 2010
Usage of this product signifies your acceptance of the Terms of Use.