2004 IEEE/WIC/ACM International Conference on Web Intelligence (WI'04) Focused Crawling by Learning HMM from User's Topic-specific Browsing Beijing, China September 20-September 24 ISBN: 0-7695-2100-2
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/WI.2004.10057
A focused crawler is designed to traverse the Web to gather documents on a specific topic. It is not an easy task to predict which links lead to good pages. In this paper, we present a new approach for prediction of the important links to relevant pages based on a learned user model. In particular, we first collect pages that a user visits during a learning session, where the user browses the Web and specifically marks which pages she is interested in. We then examine the semantic content of these pages to construct a concept graph, which is used to learn the dominant content and link structure leading to target pages using a Hidden Markov Model (HMM). Experiments show that with learned HMM from a user's browsing, the crawling performs better than Best-First strategy.
Citation:
Hongyu Liu, Evangelos Milios, Jeannette Janssen, "Focused Crawling by Learning HMM from User's Topic-specific Browsing," wi, pp.732-732, 2004 IEEE/WIC/ACM International Conference on Web Intelligence (WI'04), 2004 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||