Subscribe

Issue No.12 - Dec. (2012 vol.24)

pp: 2232-2243

Giorgos Kollias , Purdue University, West Lafayette

Shahin Mohammadi , Purdue University, West Lafayette

Ananth Grama , Purdue University, West Lafayette

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2011.174

ABSTRACT

As graph-structured data sets become commonplace, there is increasing need for efficient ways of analyzing such data sets. These analyses include conservation, alignment, differentiation, and discrimination, among others. When defined on general graphs, these problems are considerably harder than their well-studied counterparts on sets and sequences. In this paper, we study the problem of global alignment of large sparse graphs. Specifically, we investigate efficient methods for computing approximations to the state-of-the-art IsoRank solution for finding pairwise topological similarity between nodes in two networks (or within the same network). Pairs of nodes with high similarity can be used to seed global alignments. We present a novel approach to this computationally expensive problem based on uncoupling and decomposing ranking calculations associated with the computation of similarity scores. Uncoupling refers to independent preprocessing of each input graph. Decomposition implies that pairwise similarity scores can be explicitly broken down into contributions from different link patterns traced back to a low-rank approximation of the initial conditions for the computation. These two concepts result in significant improvements, in terms of computational cost, interpretability of similarity scores, and nature of supported queries. We show over two orders of magnitude improvement in performance over IsoRank/Random Walk formulations, and over an order of magnitude improvement over constrained matrix-triple-product formulations, in the context of real data sets.

INDEX TERMS

Proteins, Approximation methods, Context, Mathematical model, Computational modeling, Equations, Stability analysis, singular value decomposition, Data mining, sparse, structured, and very large systems

CITATION

Giorgos Kollias, Shahin Mohammadi, Ananth Grama, "Network Similarity Decomposition (NSD): A Fast and Scalable Approach to Network Alignment",

*IEEE Transactions on Knowledge & Data Engineering*, vol.24, no. 12, pp. 2232-2243, Dec. 2012, doi:10.1109/TKDE.2011.174REFERENCES

- [1] L. Page, S. Brin, R. Motwani, and T. Winograd, "The PageRank Citation Ranking: Bringing Order to the Web," technical report, Stanford Univ., 1998.
- [2] G. Jeh and J. Widom, "SimRank: A Measure of Structural-Context Similarity,"
Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 538-543, http://portal.acm.orgcitation.cfm?id=775126 , 2002,- [3] R. Singh, J. Xu, and B. Berger, "Global Alignment of Multiple Protein Interaction Networks with Application to Functional Orthology Detection,"
Proc. Nat'l Academy of Sciences USA, vol. 105, no. 35, pp. 12763-12768, 2008.- [4] J.Y. Pan, H.J. Yang, C. Faloutsos, and P. Duygulu, "Automatic Multimedia Cross-Modal Correlation Discovery,"
Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 653-658, 2004.- [5] V. Blondel, A. Gajardo, M. Heymans, P. Senellart, and P. Van Dooren, "A Measure of Similarity between Graph Vertices: Applications to Synonym Extraction and Web Searching,"
SIAM Rev., vol. 46, no. 4, pp. 647-666, 2004.- [6] J. Kleinberg, "Authoritative Sources in a Hyperlinked Environment,"
J. ACM, vol. 46, pp. 604-632, 1999.- [7] N. Pržulj, "Biological Network Comparison Using Graphlet Degree Distribution,"
Bioinformatics, vol. 23, no. 2, pp. e177-e183, http://bioinformatics.oxfordjournals.org/ content/23/2e177. abstract, Jan. 2007.- [8] T. Milenković and N. Pržulj, "Uncovering Biological Network Function via Graphlet Degree Signatures,"
Cancer Informatics, vol. 6, pp. 257-273, http://www.ncbi.nlm.nih.gov/pubmed19259413 , 2008.- [9] M. Bayati, M. Gerritsen, D.F. Gleich, A. Saberi, and Y. Wang, "Algorithms for Large, Sparse Network Alignment Problems,"
Proc. IEEE Ninth Int'l Conf. Data Mining (ICDM '09), pp. 705-710, 2009.- [10] L. Getoor and C. Diehl, "Link Mining: A Survey,"
SigKDD Explorations, special issue on link mining, vol. 7, no. 2, pp. 3-12, Dec. 2005.- [11] C. Silverstein, S. Brin, and R. Motwani, "Beyond Market Baskets: Generalizing Association Rules to Dependence Rules,"
Data Mining Knowledge and Discovery, vol. 2, no. 1, pp. 39-68, 1998.- [12] C.H.Q. Ding, X. He, P. Husbands, H. Zha, and H.D. Simon, "PageRank, HITS and a Unified Framework for Link Analysis,"
Proc. 25th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '02), pp. 353-354, 2002.- [13] A.Y. Ng, A.X. Zheng, and M.I. Jordan, "Link Analysis, Eigenvectors and Stability,"
Proc. 17th Int'l Joint Conf. Artificial Intelligence (IJCAI '01), pp. 903-910, 2001.- [14] K. Bharat and M.R. Henzinger, "Improved Algorithms for Topic Distillation in a Hyperlinked Environment,"
Proc. 21st Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '98), pp. 104-111, 1998.- [15] S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J.M. Kleinberg, "Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text,"
Computer Networks, vol. 30, nos. 1-7, pp. 65-74, 1998.- [16] A.X. Zheng, A.Y. Ng, and M.I. Jordan, "Stable Algorithms for Link Analysis,"
Proc. 24th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR '01), pp. 258-266, 2001.- [17] L.A. Zager and G.C. Verghese, "Graph Similarity Scoring and Matching,"
Applied Math. Letter, vol. 21, no. 1, pp. 86-94, 2008.- [18] M. Rupp, E. Proschak, and G. Schneider, "Kernel Approach to Molecular Similarity Based on Iterative Graph Similarity,"
J. Chemical Information and Modeling, vol. 47, pp. 2280-2286, 2007.- [19] G. Kollias and A. Grama, "Parallel Network Similarity Decomposition,"
Parallel Matrix Algorithms and Applications, 2010.- [20] L. Grasedyck, "Existence of a Low Rank or H-Matrix Approximant to the Solution of a Sylvester Equation,"
Numerical Linear Algebra with Applications, vol. 11, no. 4, pp. 371-389, http://onlinelibrary.wiley.com.login.ezproxy.lib.purdue.edu/ doi/10.1002/nla.366 pdf, 2004.- [21] C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, and T. Wu, "Fast Computation of Simrank for Static and Dynamic Information Networks,"
Proc. 13th Int'l Conf. Extending Database Technology, pp. 465-476, 2010.- [22] D. Liben-Nowell and J. Kleinberg, "The Link-Prediction Problem for Social Networks,"
J. Am. Soc. for Information Science and Technology, vol. 58, no. 7, pp. 1019-1031, http://onlinelibrary. wiley.com.login.ezproxy.lib.purdue.edu/ doi/10.1002/asi.20591 full, 2007.- [23] C. Brezinski and M. Redivo-Zaglia, "The PageRank Vector: Properties, Computation, Approximation, and Acceleration,"
SIAM J. Matrix Analysis and Applications, vol. 28, pp. 551-575, 2006.- [24] G. Kollias, M. Sathe, O. Schenk, and A. Grama, "Fast Parallel Algorithms for Graph Similarity and Matching," technical report, Purdue Univ., 2012.
- [25] D.D. Lee and S.H. Seung, "Learning the Parts of Objects by Non-Negative Matrix Factorization,"
Nature, vol. 401, no. 6755, pp. 788-791, http://dx.doi.org/10.103844565, Oct. 1999.- [26] S.A. Vavasis, "On the Complexity of Nonnegative Matrix Factorization,"
SIAM J. Optimization, vol. 20, pp. 1364-1377, http://dx.doi.org/10.1137070709967, Oct. 2009.- [27] C. Boutsidis and E. Gallopoulos, "SVD Based Initialization: A Head Start for Nonnegative Matrix Factorization,"
Pattern Recognition, vol. 41, no. 4, pp. 1350-1362, 2008.- [28] R. Singh and B. Berger, "IsoRank and IsoRankN," http://groups.csail.mit.edu/cbmna/, 2012.
- [29] D.F. Gleich, "Netalign: Network Alignment Codes," http://www.stanford.edu/dgleich/publications/ 2009netalign/, 2012.
- [30] R. Singh, J. Xu, and B. Berger, "Pairwise Global Alignment of Protein Interaction Networks by Matching Neighborhood Topology,"
Proc. 11th Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB '07), vol. 4453, pp. 16-31, http://dx.doi.org/10.1007978-3-540-71681-5_2 , 2007.- [31] K. lok Ng and C. Huang, "A Cross-Species Study of the Protein-Protein Interaction Networks via the Random Graph Approach,"
Proc. IEEE Fourth Symp. Bioinformatics and Bioeng. (BIBE), pp. 561-567, 2004.- [32] A.L. Traud, E.D. Kelsic, P.J. Mucha, and M.A. Porter, "Community Structure in Online Collegiate Social Networks," arXiv:0809.0960, 2008.
- [33] NetworkX Project Website, https:/networkx.lanl.gov/, 2012.
- [34] P. Boldi and S. Vigna, "The WebGraph Framework I: Compression Techniques,"
Proc. 13th Int'l World Wide Web Conf. (WWW '04), pp. 595-601, 2004.- [35] WebGraph Project Website," http:/webgraph.dsi.unimi.it/, 2012.
- [36] P. Boldi, B. Codenotti, M. Santini, and S. Vigna, "UbiCrawler: A Scalable Fully Distributed Web Crawler,"
Software: Practice and Experience, vol. 34, no. 8, pp. 711-726, 2004. |