This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search
July/August 2003 (vol. 15 no. 4)
pp. 784-796

Abstract—The original PageRank algorithm for improving the ranking of search-query results computes a single vector, using the link structure of the Web, to capture the relative “importance” of Web pages, independent of any particular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased using a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic. For ordinary keyword search queries, we compute the topic-sensitive PageRank scores for pages satisfying the query using the topic of the query keywords. For searches done in context (e.g., when the search query is performed by highlighting words in a Web page), we compute the topic-sensitive PageRank scores using the topic of the context in which the query appeared. By using linear combinations of these (precomputed) biased PageRank vectors to generate context-specific importance scores for pages at query time, we show that we can generate more accurate rankings than with a single, generic PageRank vector. We describe techniques for efficiently implementing a large-scale search system based on the topic-sensitive PageRank scheme.

[1] The Google Search Engine: Commercial Search Engine founded by the Originators of PageRank,http:/www.google.com/, 2003.
[2] The Open Directory Project: Web Directory for Over 2.5 Million URLs,http:/www.dmoz.org/, 2003.
[3] More Evil Than Dr. Evil? http://searchenginewatch.com/sereport/99 11-google.html, 2003.
[4] K. Bharat and M.R. Henzinger, Improved Algorithms for Topic Distillation in a Hyperlinked Environment Proc. ACM-SIGIR, 1998.
[5] K. Bharat and G.A. Mihaila, When Experts Agree: Using Non-Affiliated Experts to Rank Popular Topics Proc. 10th Int'l World Wide Web Conf., 2001.
[6] S. Brin, R. Motwani, L. Page, and T. Winograd, What can you do with a Web in Your Pocket Bull. IEEE CS Technical Committee Data Eng., 1998.
[7] S. Brin and L. Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine Proc. Seventh Int'l World Wide Web Conf., 1998.
[8] S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan, Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text Proc. Seventh Int'l World Wide Web Conf., 1998.
[9] S. Chakrabarti, Mining the Web: Discovering Knowledge from Hypertext Data. San Francisco: Morgan-Kaufmann Publishers, 2002.
[10] S. Chakrabarti, M.M. Joshi, K. Punera, and D.M. Pennock, The Structure of Broad Topics on the Web Proc. 11th Int'l World Wide Web Conf., 2002.
[11] M. Diligenti, M. Gori, and M. Maggini, Web Page Scoring Systems for Horizontal and Vertical Search Proc. 11th Int'l World Wide Web Conf., May 2002.
[12] C. Dwork, R. Kumar, M. Naor, and D. Sivakumar, Rank Aggregation Methods for the Web Proc. 10th Int'l World Wide Web Conf., 2001.
[13] R. Fagin, R. Kumar, and D. Sivakumar, Comparing Top$k$Lists Proc. ACM-SIAM Symp. Discrete Algorithms, 2003.
[14] L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin, Placing Search in Context: The Concept Revisited Proc. 10th Int'l World Wide Web Conf., 2001.
[15] R. Gray and D. Neuhoff, “Quantization,” IEEE Trans. Information Theory, vol. 44, pp. 2325-2384, Oct. 1998.
[16] T.H. Haveliwala, Efficient Computation of PageRank Stanford Univ. Technical Report, 1999.
[17] T.H. Haveliwala, Efficient Encodings for Document Ranking Vectors Stanford Univ. technical report, Nov. 2002.
[18] T.H. Haveliwala, Topic-Sensitive PageRank Proc. 11th Int'l World Wide Web Conf., May 2002.
[19] J. Hirai, S. Raghavan, H. Garcia-Molina, and A. Paepcke, Webbase: A Repository of Web Pages Proc. Ninth Int'l World Wide Web Conf., 2000.
[20] G. Jeh and J. Widom, Scaling Personalized Web Search Proc. 12th Int'l World Wide Web Conf., May 2003.
[21] S.D. Kamvar, T.H. Haveliwala, C.D. Manning, and G.H. Golub, Extrapolation Methods for Accelerating PageRank Computations Proc. 12th Int'l World Wide Web Conf., May 2003.
[22] J. Kleinberg, Authoritative Sources in a Hyperlinked Environment Proc. ACM-SIAM Symp. Discrete Algorithms, 1998.
[23] A. McCallum and K. Nigam, A Comparison of Event Models for Naive Bayes Text Classification Proc. AAAI-98 Workshop Learning for Text Categorization, 1998.
[24] T. Mitchell, Machine Learning. Boston: McGraw-Hill, chapter 6, pp. 177-184, 1997.
[25] R. Motwani and P. Raghavan, Randomized Algorithms. U.K.: Cambridge Univ. Press, 1995.
[26] L. Page, S. Brin, R. Motwani, and T. Winograd, The PageRank Citation Ranking: Bringing Order to the Web Stanford Digital Libraries Working Paper, 1998.
[27] D. Pennock, G. Flake, S. Lawrence, E. Glover, C.L. Giles, Winner's Don't Take All: Characterizing the Competition for Links on the Web Proc. Nat'l Academy of Sciences, 2002.
[28] D. Rafiei and A.O. Mendelzon, What is this Page Known for? Computing Web Page Reputations Proc. Ninth Int'l World Wide Web Conf., 2000.
[29] M. Richardson and P. Domingos, The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank. Cambridge, Mass.: MIT Press, vol. 14, 2002.
[30] I.H. Witten, A. Moffat, and T.C. Bell, Managing Gigabytes. San Francisco: Morgan Kaufmann Publishers, 1999.

Index Terms:
Web search, web graph, link analysis, PageRank, search in context, personalized search, ranking algorithm.
Citation:
Taher H. Haveliwala, "Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 4, pp. 784-796, July-Aug. 2003, doi:10.1109/TKDE.2003.1208999
Usage of this product signifies your acceptance of the Terms of Use.