International Workshop on Knowledge Discovery and Data Mining (2009)
Jan. 23, 2009 to Jan. 25, 2009
ISBN: 978-0-7695-3543-2
pp: 44-47
The rapid growth of the World-Wide web poses unprecedented scaling challenges for general-purpose crawlers and search engines. Crawling the Web quickly and entirely is an expensive, unrealistic goal because of the required hardware and network resources. A focused crawler is an agent that targets a particular topic and visits and gathers only a relevant, narrow web segment while trying not to waste resources on irrelevant material. It can be used to build domain-specific web search portals and online personalized search tools. In this paper, we describe the design and implementation of a university focused crawler that runs on BP network classifier for prediction of the links leading to relevant pages. We present the flow of the system, discuss the performance, report the experimental results based on it. Our experiments show that the BP classifier performs very well in obtaining accurate relevant university Web resources.
Crawler; BP network, search engines, domain specific, Web resources

