Subscribe

Issue No.05 - May (2011 vol.22)

pp: 786-802

Ali Cevahir , Tokyo Institute of Technology, Tokyo

Cevdet Aykanat , Bilkent University, Ankara

Ata Turk , Bilkent University, Ankara

B. Barla Cambazoglu , Yahoo! Research, Barcelona

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.119

ABSTRACT

The PageRank algorithm is an important component in effective web search. At the core of this algorithm are repeated sparse matrix-vector multiplications where the involved web matrices grow in parallel with the growth of the web and are stored in a distributed manner due to space limitations. Hence, the PageRank computation, which is frequently repeated, must be performed in parallel with high-efficiency and low-preprocessing overhead while considering the initial distributed nature of the web matrices. Our contributions in this work are twofold. We first investigate the application of state-of-the-art sparse matrix partitioning models in order to attain high efficiency in parallel PageRank computations with a particular focus on reducing the preprocessing overhead they introduce. For this purpose, we evaluate two different compression schemes on the web matrix using the site information inherently available in links. Second, we consider the more realistic scenario of starting with an initially distributed data and extend our algorithms to cover the repartitioning of such data for efficient PageRank computation. We report performance results using our parallelization of a state-of-the-art PageRank algorithm on two different PC clusters with 40 and 64 processors. Experiments show that the proposed techniques achieve considerably high speedups while incurring a preprocessing overhead of several iterations (for some instances even less than a single iteration) of the underlying sequential PageRank algorithm.

INDEX TERMS

PageRank, sparse matrix-vector multiplication, web search, parallelization, sparse matrix partitioning, graph partitioning, hypergraph partitioning, repartitioning.

CITATION

Ali Cevahir, Cevdet Aykanat, Ata Turk, B. Barla Cambazoglu, "Site-Based Partitioning and Repartitioning Techniques for Parallel PageRank Computation",

*IEEE Transactions on Parallel & Distributed Systems*, vol.22, no. 5, pp. 786-802, May 2011, doi:10.1109/TPDS.2010.119REFERENCES

- [1] A. Arasu, J. Novak, A. Tomkins, and J. Tomlin, "PageRank Computation and the Structure of the Web: Experiments and Algorithms,"
Proc. 11th Int'l World Wide Web (WWW) Conf., Poster, 2002.- [2] C. Aykanat, F. Ozguner, and D. Scott, "Vectorization and Parallelization of the Conjugate Gradient Algorithm on Hypercube-Connected Vector Processors,"
J. Microprocessing and Microprogramming, vol. 29, pp. 67-82, 1990.- [3] C. Aykanat, B.B. Cambazoglu, and B. Ucar, "Multilevel Hypergraph Partitioning with Multiple Constraints and Fixed Vertices,"
J. Parallel and Distributed Computing, vol. 68, pp. 609-625, 2008.- [4] C. Aykanat, B.B. Cambazoglu, F. Findik, and T. Kurc, "Adaptive Decomposition and Remapping Algorithms for Object-Space-Parallel Direct Volume Rendering of Unstructured Grids,"
J. Parallel and Distributed Computing, vol. 67, pp. 77-99, 2006.- [5] A.-L. Barabasi and R. Albert, "Emergence of Scaling in Random Networks,"
Science, vol. 286, pp. 509-512, 1999.- [6] P. Berkhin, "A survey on PageRank Computing,"
Internet Math., vol. 2, no. 1 pp. 73-120, 2005.- [7] P. Boldi and S. Vigna, "The WebGraph Framework I: Compression Techniques,"
Proc. 13th Int'l World Wide Web (WWW) Conf., pp. 595-602, 2004.- [8] P. Boldi and S. Vigna, "Codes for the World Wide Web,"
Internet Math., vol. 2, pp. 405-427, 2004.- [9] P. Boldi, B. Codenotti, M. Santini, and S. Vigna, "UbiCrawler: A Scalable Fully Distributed Web Crawler,"
Software: Practice and Experience, vol. 43, pp. 711-726, 2004.- [10] P. Boldi, M. Santini, and S. Vigna, "Permuting Web Graphs,"
Proc. Sixth Int'l Workshop Algorithms and Models for the Web-Graph, pp. 116-126, 2009.- [11] J.T. Bradley, D.V. De Jager, W.J. Knottenbelt, and A. Trifunovic, "Hypergraph Partitioning for Faster Parallel PageRank Computation,"
Lecture Notes in Computer Science, pp. 155-171, Springer, 2005.- [12] C. Brezinski, M. Redivo-Zaglia, and S. Serra-Capizzano, "Extrapolation Methods for PageRank Computations,"
Comptes Rendus de l'Académie des Sciences de Paris, vol. 340, pp. 393-397, 2005.- [13] S. Brin and L. Page, "The Anatomy of a Large-Scale Hypertextual Web Search Engine,"
Computer Networks and ISDN Systems, vol. 33, no. 3, pp. 107-117, 1998.- [14] B.B. Cambazoglu and C. Aykanat, "Hypergraph-Partitioning-Based Remapping Models for Image-Space-Parallel Direct Volume Rendering of Unstructured Grids,"
IEEE Trans. Parallel and Distributed Systems, vol. 18, no. 1, pp. 3-16, Jan. 2007.- [15] U.V. Çatalyürek and C. Aykanat, "Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication,"
IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 7, pp. 673-693, July 1999.- [16] U.V. Çatalyürek, C. Aykanat, and B. Ucar, "On Two-Dimensional Sparse Matrix Partitioning: Models, Methods and a Recipe,"
SIAM J. Scientific Computing, vol. 32, no. 2, pp. 656-683, 2010.- [17] U.V. Çatalyürek and C. Aykanat, "A Multilevel Hypergraph Partitioning Tool, V. 3.0," technical report, Dept. of Computer Eng., Bilkent Univ., 1999.
- [18] U.V. Çatalyürek, E.G. Boman, K.D. Devine, D. Bozdag, R.T. Heaphy, and L.A. Riesen, "Dynamic Load Balancing for Adaptive Scientific Computations via Hypergraph Partitioning,"
J. Parallel and Distributed Computing, vol. 69, no. 8, pp. 711-724, 2009.- [19] A. Cevahir, C. Aykanat, A. Turk, and B. Barla Cambazoglu, "Site-Based Partitioning Tool for Parallel PageRank Computation," http://matsu-www.is.titech.ac.jp/~alipagerank.html , 2010.
- [20] A. Cevahir, C. Aykanat, A. Turk, and B. Barla Cambazoglu, "Web-Site-Based Partitioning Techniques for Reducing the Preprocessing Overhead before the Parallel PageRank Computations,"
Applied Parallel Computing. State of the Art in Scientific Computing, Springer, June 2006.- [21] J. Cho and H. Garcia-Molina, "The Evolution of the Web and Implications for an Incremental Crawler,"
Proc. World Wide Web Conf., pp. 200-209, May 1999.- [22] G.M. Del Corso, A. Gullí, and F. Romani, "Comparison of Krylov Subspace Methods on the PageRank Problem,"
J. Computational and Applied Math., vol. 210, pp. 159-166, 2007.- [23] G.M. Del Corso, A. Gullí, and F. Romani, "Fast PageRank Computation via a Sparse Linear System,"
Internet Math., vol. 2, no. 3, pp. 251-273, 2005.- [24] Web Graph Benchmark: http://hipercom.inria.fr/~viennot webgraph , 2010.
- [25] D. Gleich, L. Zhukov, and P. Berkhin, "Fast Parallel PageRank: A Linear System Approach," Technical Report YRL-2004-038, Yahoo!, 2004.
- [26] J.-L. Guillaume, M. Latapy, and L. Viennot, "Efficient and Simple Encodings for the Web Graph,"
Proc. 11th Int'l World Wide Web (WWW) Conf., 2002.- [27] Google Programming Contest: http://www.google.com programming-contest /, 2004.
- [29] Z. Gyongyi, H. Garcia-Molina, and J. Pedersen, "Combating Web Spam with TrustRank,"
Proc. 30th Very Large Data Bases (VLDB) Conf., vol. 1, pp. 257-263, 2004.- [30] G.H. Golub and J.F.V. Loan,
Matrix Computation, third ed. John Hopkins Univ. Press, 1996.- [31] T. Haveliwala, "Topic Sensitive PageRank,"
Proc. 11th Int'l World Wide Web Conf., pp. 517-526, 2002.- [32] B. Hendrickson and E. Rothberg, "Improving the Run Time and Quality of Nested Dissection Ordering,"
SIAM J. Scientific Computing, vol. 20, no. 2, pp. 468-489, 1998.- [33] B. Hendrickson and T.G. Kolda, "Graph Partitioning Models for Parallel Computing,"
Parallel Computing, vol. 26, pp. 1519-1534, 2000.- [34] E. Horowitz and S. Sahni,
Fundamentals of Computer Algorithms. Computer Science Press, 1978.- [35] I.C.F. Ipsen and S. Kirkland, "Convergence Analysis of a PageRank Updating Algorithm by Langville and Meyer,"
SIAM J. Matrix Analysis and Applications, vol. 27, pp. 952-967, 2006.- [36] I.C.F. Ipsen and T.M. Selee, "PageRank Computation, with Special Attention to Dangling Nodes,"
SIAM J. Matrix Analysis and Applications, vol. 29, pp. 1281-1296, 2007.- [37] I.C.F. Ipsen and R.S. Wills, "Mathematical Properties and Analysis of Google's PageRank,"
Bol. Soc. Exp. May. Apl., vol. 34, pp. 191-196, 2006.- [38] S. Kamvar, T. Haveliwala, C. Manning, and G. Golub, "Extrapolation Methods for Accelerating PageRank Computations,"
Proc. 12th Int'l World Wide Web Conf., pp. 261-270, 2003.- [39] S. Kamvar, T. Haveliwala, and G. Golub, "Adaptive Methods for Computation of PageRank,"
Proc. Int'l Conf. Numerical Solution of Markov Chains, 2003.- [40] S. Kamvar, T. Haveliwala, C. Manning, and G. Golub, "Exploiting the Block Structure of the Web for Computing PageRank," technical report, Stanford Univ., 2003.
- [41] C. Kohlschtter, R. Chirita, and W. Nejdl, "Efficient Parallel Computation of PageRank,"
Proc. 28th European Conf. IR Research (ECIR), pp. 241-252, 2006.- [42] G. Kollias and E. Gallopoulos, "Asynchronous PageRank Computation in an Interactive Multithreading Environment,"
Proc. Seminar Web Information Retrieval and Linear Algebra Algorithms, 2007.- [43] G. Kollias, E. Gallopoulos, and D.B. Szyld, "Asynchronous Iterative Computations with Web Information Retrieval Structures: The PageRank Case," http://arxiv.org/abs/cs0606047, 2006.
- [44] A. Langville and C. Meyer, "Deeper Inside PageRank,"
Internet Math., vol. 1, no. 3, pp. 335-380, 2005.- [45] A. Langville and C. Meyer, "A Reordering for the PageRank Problem,"
SIAM J. Scientific Computing, vol. 27, no. 6, pp. 2112-2120, 2006.- [46] A. Langville and C. Meyer, "Updating Markov Chains with an Eye on Google's PageRank,"
SIAM J. Matrix Analysis and Applications, vol. 27, no. 4, pp. 968-987, 2006.- [47] Larbin Home Page, http://larbin.sourceforge.netindex-eng. html /, 2010.
- [48] C. Lee, G. Golub, and S. Zenios, "A Fast Two-Stage Algorithm for Computing PageRank," technical report, Stanford Univ., 2003.
- [49] K. Avrachenkov and N. Litvak, "Decomposition of the Google PageRank and Optimal Linking Strategy," technical report, INRIA, 2004.
- [50] B. Manaskasemsak and A. Rungsawang, "An Efficient Partition-Based Parallel PageRank Algorithm,"
Proc. 11th Int'l Conf. Parallel and Distributed Systems, vol. 1, pp. 257-263, 2005.- [51] G. Karypis and V. Kumar, "MeTiS: Unstrctured Graph Partitioning and Sparse Matrix Ordering System," technical report, Dept. of Computer Science, Univ. of Minnesota, 1995.
- [52] F. McSherry, "A Uniform Approach to Accelerated PageRank Computation,"
Proc. 14th Int'l World Wide Web (WWW) Conf., pp. 575-582, 2005.- [53] S. Matsuoka,
Petascale Computing Algorithms and Applications—Chapter 14: The Road to TSUBAME and Beyond. Chapman & Hall/CRC, pp. 289-310, 2008.- [54] L. Page, S. Brin, R. Motwani, and T. Winograd, "The PageRank Citation Ranking: Bringing Order to the Web," technical report, Stanford Univ., 1999.
- [55] J.X. Parreira, C. Castillo, D. Donato, S. Michel, and G. Weikum, "The Juxtaposed Approximate PageRank Method for Robust PageRank Approximation in a Peer-to-Peer Web Search Network,"
The Int'l J. Very Large Data Bases, vol. 17, pp. 291-313, 2008.- [56] A. Trifunovic and W.J. Knottenbelt, "Parkway 2.0: A Parallel Multilevel Hypergraph Partitioning Tool,"
Lecture Notes in Computer Science, pp. 789-800, Springer, 2004.- [57] B. Ucar and C. Aykanat, "A Library for Parallel Sparse Matrix-Vector Multiplies," technical Report BU-CE-0506, Dept. of Computer Eng., Bilkent Univ., 2005.
- [58] B. Ucar and C. Aykanat, "Encapsulating Multiple Communication-Cost Metrics in Partitioning Sparse Rectangular Matrices for Matrix-Vector Multiplies,"
SIAM J. Scientific Computing, vol. 25, pp. 1837-1859, 2004.- [59] The Stanford WebBase Project Home Page, http://dbpubs. stanford.edu:8091/~testbed/ doc2WebBase/, 2010.
- [60] Y. Wang and D.J. DeWitt, "Computing Pagerank in a Distributed Internet Search System,"
Proc. 13th Int'l Conf. Very Large Data Baes (VLDB), vol. 30, pp. 420-431, 2004.- [61] R.S. Wills and I.C.F. Ipsen, "Ordinal Ranking for Google's PageRank,"
SIAM J. Matrix Analysis and Applications, vol. 30, pp. 1677-1696, 2008. |