The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.09 - Sept. (2012 vol.24)
pp: 1711-1725
Guoming He , Renmin University of China, Beijing
Cuiping Li , Renmin University of China, Beijing
Hong Chen , Renmin University of China, Beijing
Xiaoyong Du , Renming University of China, Beijing
Haijun Feng , Renmin University of China, Beijing
ABSTRACT
Recently there has been a lot of interest in graph-based analysis. One of the most important aspects of graph-based analysis is to measure similarity between nodes in a graph. SimRank is a simple and influential measure of this kind, based on a solid graph theoretical model. However, existing methods on SimRank computation suffer from two limitations: 1) the computing cost can be very high in practice; and 2) they can only be applied on static graphs. In this paper, we exploit the inherent parallelism and high memory bandwidth of graphics processing units (GPU) to accelerate the computation of SimRank on large graphs. Furthermore, based on the observation that SimRank is essentially a first-order Markov Chain, we propose to utilize the iterative aggregation techniques for uncoupling Markov chains to compute SimRank scores in parallel for large graphs. The iterative aggregation method can be applied on dynamic graphs. Moreover, it can handle not only the link-updating problem but also the node-updating problem. We give the corresponding theoretical justification and analysis, propose three optimization strategies to further improve the computation efficiency, and extend the proposed algorithm to dynamic graphs. Extensive experiments on synthetic and real data sets verify that the proposed methods are efficient and effective.
INDEX TERMS
iterative aggregation, GPU, parallel, SimRank, graph
CITATION
Guoming He, Cuiping Li, Hong Chen, Xiaoyong Du, Haijun Feng, "Using Graphics Processors for High Performance SimRank Computation", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 9, pp. 1711-1725, Sept. 2012, doi:10.1109/TKDE.2011.91
REFERENCES
[1] P. Ganesan, H. Garcia-Molina, and J. Widom, "Exploiting Hierarchical Domain Structure to Compute Similarity," ACM Trans. Information Systems, vol. 21, no. 1, pp. 64-93, 2003.
[2] G. Jeh and J. Widom, "Simrank: A Measure of Structural-Context Similarity," Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '02), pp. 538-543, 2002.
[3] Y. Koren, S.C. North, and C. Volinsky, "Measuring and Extracting Proximity in Networks," Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '06), pp. 245-255, 2006.
[4] C. Faloutsos, K.S. McCurley, and A. Tomkins, "Fast Discovery of Connection Subgraphs," Proc. 10th ACM SIGKDD Int'l Conf. Knowledge discovery and Data Mining (KDD '04), pp. 118-127, 2004.
[5] E.A. Leicht, P. Holme, and M.E.J. Newman, "Vertex Similarity in Networks," Physical Rev. E, vol. 73, p. 026120, http://dx.doi.org/10.1103PhysRevE.73.026120 , 2006.
[6] D. Fogaras and B. Racz, "Scaling Link-Based Similarity Search," Proc. 14th Int'l Conf. World Wide Web (WWW '05), 2005.
[7] D. Lizorkin, P. Velikhov, M. Grinev, and D. Turdakov, "Accuracy Estimate and Optimization Techniques for Simrank Computation," Proc. 34th Int'l Conf. Very Large Databases (VLDB'08), 2008.
[8] C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, and T. Wu, "Fast Computation of Simrank for Static and Dynamic Information Networks," Proc. 13th Int'l Conf. Extending Database Technology (EDBT '10), 2010.
[9] A.N. Langville and C.D. Meyer, "Updating Markov Chains with an Eye on Google's PageRank," SIAM J. Matrix Analysis and Applications, vol. 27, no. 4, pp. 968-987, 2006.
[10] Y. Takahashi, "A Lumping Method for Numerical Calculations of Stationary Distributions of Markov Chains," Technical Report B-18, Dept. of Information Sciences, Tokyo Inst. of Technology, Tokyo, 1975.
[11] Y. Zhu, S. Ye, and X. Li, "Distributed Pagerank Computation Based on Iterative Aggregation-Disaggregation Methods," Proc. 14th ACM Int'l Conf. Information and Knowledge Management (CIKM '05), pp. 578-585, 2005.
[12] I. Marek, P. Mayer, and I. Pultarova, "Convergence Issues in the Theory and Practice of Iterative Aggregation/Disagrregation Methods," Electronic Trans. Numerical Analysis, vol. 35, pp. 185-200, 2009.
[13] R.H. Bartels and G.W. Stewart, "Solution of the Matrix Equation ${\rm ax + xb = c}$ ," Comm. ACM, vol. 15, no. 9, pp. 820-826, 1972.
[14] D. Lizorkin, P. Velikhov, M.N. Grinev, and D. Turdakov, "Accuracy Estimate and Optimization Techniques for Simrank Computation," Proc. VLDB Endowment, vol. 1, no. 1, pp. 422-433, 2008.
[15] A. Buluç and J.R. Gilbert, "Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication," Proc. 37th Int'l Conf. Parallel Processing (ICPP '08), pp. 503-510, 2008.
[16] "NVIDIA CUDA Programming Guide 2.3.," 2009.
[17] M. Ley, "Dblp: Some Lessons Learned," Proc. VLDB Endowment, vol. 2, no. 2, pp. 1493-1500, 2009.
[18] L.S. Buriol, C. Castillo, D. Donato, S. Leonardi, and S. Millozzi, "Temporal Analysis of the Wikigraph," Proc. IEEE/WIC/ACM Int'l Conf. Web Intelligence (WI '06), pp. 45-51, 2006.
[19] M.E.J. Newman, "The Structure and Function of Complex Netwroks," SIAM Rev., vol. 45, pp. 167-256, 2003.
[20] X. Yan, P.S. Yu, and J. Han, "Substructure Similarity Search in Graph Databases," Proc. ACM-SIGMOD Int'l Conf. Management of Data, 2005.
[21] X. Yan and J. Han, "Closegraph: Mining Closed Frequent Graph Patterns," Proc. Ninth Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), 2003.
[22] A. Ng, M. Jordan, and Y. Weiss, "On Spectral Clustering: Analysis and an Algorithm," Proc. Advances in Neural Information Processing Systems(NIPS), 2002.
[23] M. Girvan and M. Newman, "Community Structure in Social and Biological Networks," Proc. Nat'l Academy of Sciences USA, vol. 99, pp. 7821-7826, 2002.
[24] L. Page, S. Brin, R. Motwani, and T. Winograd, "The Pagerank Citation Ranking: Bringing Order to the Web," technical report, Stanford Univ. Database Group, http://citeseer.nj.nec.com368196.html, 1998.
[25] J. Kleinberg, "Authoritative Sources in a Hyperlinked Environment," J. ACM, vol. 46, pp. 604-632, 1999.
[26] A.G. Maguitman, F. Menczer, F. Erdinc, H. Roinestad, and A. Vespignani, "Algorithmic Computation and Approximation of Semantic Similarity," J. World Wide Web, vol. 9, no. 4, pp. 431-456, 2006.
[27] W. Xi, E.A. Fox, W. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang, "Simfusion: Measuring Similarity Using Unified Relationship Matrix," Proc. 28th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '05), pp. 130-137, 2005.
[28] G. He, H. Feng, C. Li, and H. Chen, "Parallel Simrank Computation on Large Graphs with Iterative Aggregation," Proc. 16th Int'l Conf. Knowledge Discovery and Data Mining (KDD '10), 2010.
[29] C. Tantipathananandh, T.Y. Berger-Wolf, and D. Kempe, "A Framework for Community Identification in Dynamic Social Networks," Proc. 13th Int'l Conf. Knowledge Discovery and Data Mining (KDD '07), 2007.
[30] L. Backstrom, D. Huttenlocher, and J. Kleinberg, "Group Formation in Large Social Networks: Membership Growth, and Evolution," Proc. 12th Int'l Conf. Knowledge Discovery and Data Mining (KDD '06), 2006.
[31] J. Leskovec, J.M. Kleinberg, and C. Faloutsos, "Graphs over Time: Densification Laws Shrinking Diameters and Possible Explanations," Proc. 13th Int'l Conf. Knowledge Discovery and Data Mining (KDD '07), 2007.
[32] J. Sun, D. Tao, and C. Faloutsos, "Beyond Streams and Graphs: Dynamic Tensor Analysis," Proc. 12th Int'l Conf. Knowledge Discovery and Data Mining (KDD '06), 2006.
[33] Y. Chi, X. Song, D. Zhou, K. Hino, and B.L. Tseng, "Evolutionary Spectral Clustering by Incorporating Temporal Smoothness," Proc. 13th Int'l Conf. Knowledge Discovery and Data Mining (KDD '07), 2007.
[34] H. Tong, S. Papadimitriou, P.S. Yu, and C. Faloutsos, "Proximity Tracking on Time-Evolving Bipartite Graphs," Proc. SIAM Int'l Conf. Data Mining (SDM), 2008.
[35] C.D. Meyer, "Stochastic Complementation, Uncoupling Markov Chains, and the Theory of Nearly Reducible Systems," SIAM Rev., vol. 31, no. 2, pp. 240-272, 1989.
[36] A.N. Langville and C.D. Meyer, "Updating Pagerank with Iterative Aggregation," Proc. 13th Int'l World Wide Web Conf. Alternate Track Papers & Posters (WWW Alt. '04), pp. 392-393, 2004.
[37] V. Garcia, E. Debreuve, and M. Barlaud, "Fast K Nearest Neighbor Search Using GPU," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition Workshops, 2008.
[38] N.K. Govindaraju, N. Raghuvanshi, and D. Manocha, "Fast and Approximate Stream Mining of Quantiles and Frequencies Using Graphics Processors," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '05), pp. 611-622, 2005.
[39] S. Ding, J. He, H. Yan, and T. Suel, "Using Graphics Processors for High Performance IR Query Processing," Proc. 18th Int'l Conf. World Wide Web (WWW '09), pp. 421-430, 2009.
[40] B. He, K. Yang, R. Fang, M. Lu, N.K. Govindaraju, Q. Luo, and P.V. Sander, "Relational Joins on Graphics Processors," Proc. ACM SIGMOD Int'l conf. Management of data (SIGMOD '08), pp. 511-524, 2008.
[41] N.K. Govindaraju, J. Gray, R. Kumar, and D. Manocha, "Gputerasort: High Performance Graphics Co-Processor Sorting for Large Database Management," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '06), pp. 325-336, 2006.
[42] W. Fang, K.K. Lau, M. Lu, X. Xiao, C.K. Lam, P.Y. Yang, B. He, Q. Luo, P.V. Sander, and K. Yang, "Parallel Data Mining on Graphic Processors," Technical Report HKUST-CS09-07, 2008.
[43] N. Bell and M. Garland, "Efficient Sparse Matrix-Vector Multiplication on Cuda," Technical Report NVR-2008-004, 2008.
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool