This Article 
 Bibliographic References 
 Add to: 
A Web Surfer Model Incorporating Topic Continuity
May 2005 (vol. 17 no. 5)
pp. 726-729
This paper describes a surfer model which incorporates information about topic continuity derived from the surfer's history. Therefore, unlike earlier models, it captures the interrelationship between categorization (context) and ranking of Web documents simultaneously. The model is mathematically formulated. A scalable and convergent iterative procedure is provided for its implementation. Its different characteristic features, as obtained from the joint probability matrix, and their significance in Web intelligence are mentioned. Experiments performed on Web pages obtained from WebBase confirm the superiority of the model.

[1] doc2WebBase/, 2005.
[2] S. Brin and L. Page, “The Anatomy of a Large-Scale Hypertextual Search Engine,” Stanford Univ., technical report, 1998.
[3] J. Kleinberg, “Authoritative Sources in a Hyperlinked Environment,” J. ACM, vol. 46, no. 5, pp. 604-632, 1999.
[4] M. Richardson and P. Domingos, “The Intelligent Surfer: Probabilistic Combination of Link and Content Information in Pagerank,,” Advances in Neural Information Processing Systems, vol. 14, pp. 1441-1448, MIT Press, 2002.
[5] T.H. Haveliwala, “Topic-Sensitive Pagerank: A Context-Sensitive Ranking Algorithm for Web Search,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 4, pp. 784-796, Jan./Feb. 2003.
[6] M. Diligenti, M. Gori, and M. Maggini, “A Unified Probabilistic Framework for Web Page Scoring Systems,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 1, pp. 4-16, Jan. 2004.
[7] D. Rafiei and A.O. Mendelzon, “What Is This Page Known for? Computing Web Page Reputations,” Proc. Ninth Int'l World Wide Web Conf., pp. 823-835, 2000.
[8] http://dmoz.orgabout.html, 2005.
[9] D.D. Lewis, “Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval,” Proc. ECML-98, 10th European Conf. Machine Learning, pp. 4-15, 1998.
[10] B.D. Davison, “Unifying Text and Link Analysis,” Proc. Int'l Joint Conf. Artificial Intelligence Workshop Text-Mining & Link-Analysis (TextLink), 2003.
[11] B.L. Narayan, C.A. Murthy, and S.K. Pal, “Topic Continuity for Web Document Categorization and Ranking,” Proc. 2003 IEEE/WIC Int'l Conf. Web Intelligence, pp. 310-315, 2003.
[12] T.H. Haveliwala, “Efficient Computation of Pagerank,” Stanford Univ., technical report, 1999.
[13] S.D. Kamvar, T.H. Haveliwala, C.D. Manning, and G.H. Golub, “Extrapolation Methods for Accelerating Pagerank Computations,” Proc. 12th Int'l World Wide Web Conf., pp. 261-270, May 2003.
[14] T. Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” Proc. ECML-98, 10th European Conf. Machine Learning, pp. 137-142, 1998.
[15] S. Chakrabarti, M. Joshi, K. Punera, and D. Pennock, “The Structure of Broad Topics on the Web,” Proc. 11th Int'l World Wide Web Conf., pp. 251-262, 2002.
[16] P. Baldi, P. Frasconi, and P. Smyth, Modeling the Internet and the Web: Probabilistic Methods and Algorithms. J. Wiley and Sons, 2003.
[17] S. Chakrabarti, Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufman, 2002.

Index Terms:
Web intelligence, probabilistic surfer history, page ranking, stochastic processes, context identification, categorization.
Sankar K. Pal, B.L. Narayan, Soumitra Dutta, "A Web Surfer Model Incorporating Topic Continuity," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 5, pp. 726-729, May 2005, doi:10.1109/TKDE.2005.69
Usage of this product signifies your acceptance of the Terms of Use.