This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Protein Complexes Discovery Based on Protein-Protein Interaction Data via a Regularized Sparse Generative Network Model
May-June 2012 (vol. 9 no. 3)
pp. 857-870
Dao-Qing Dai, Center for Comput. Vision & Dept. of Math., Sun Yat-Sen Univ., Guangzhou, China
Xiao-Fei Zhang, Center for Comput. Vision & Dept. of Math., Sun Yat-Sen Univ., Guangzhou, China
Xiao-Xin Li, Center for Comput. Vision & Dept. of Math., Sun Yat-Sen Univ., Guangzhou, China
Detecting protein complexes from protein interaction networks is one major task in the postgenome era. Previous developed computational algorithms identifying complexes mainly focus on graph partition or dense region finding. Most of these traditional algorithms cannot discover overlapping complexes which really exist in the protein-protein interaction (PPI) networks. Even if some density-based methods have been developed to identify overlapping complexes, they are not able to discover complexes that include peripheral proteins. In this study, motivated by recent successful application of generative network model to describe the generation process of PPI networks and to detect communities from social networks, we develop a regularized sparse generative network model (RSGNM), by adding another process that generates propensities using exponential distribution and incorporating Laplacian regularizer into an existing generative network model, for protein complexes identification. By assuming that the propensities are generated using exponential distribution, the estimators of propensities will be sparse, which not only has good biological interpretation but also helps to control the overlapping rate among detected complexes. And the Laplacian regularizer will lead to the estimators of propensities more smooth on interaction networks. Experimental results on three yeast PPI networks show that RSGNM outperforms six previous competing algorithms in terms of the quality of detected complexes. In addition, RSGNM is able to detect overlapping complexes and complexes including peripheral proteins simultaneously. These results give new insights about the importance of generative network models in protein complexes identification.

[1] L.H. Hartwell, J.J. Hopfield, S. Leibler, and A.W. Murray, "From Molecular to Modular Cell Biology," Nature, vol. 402, no. 6761, pp. C47-C52, 1999.
[2] A.L. Barabási and Z.N. Oltvai, "Network Biology: Understanding the Cell's Functional Organization," Nature Reviews Genetics, vol. 5, no. 2, pp. 101-113, 2004.
[3] T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki, "A Comprehensive Two-Hybrid Analysis to Explore the Yeast Protein Interactome," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 8, pp. 4569-4574, 2001.
[4] A.C. Gavin et al., "Proteome Survey Reveals Modularity of the Yeast Cell Machinery," Nature, vol. 440, no. 7084, pp. 631-636, 2006.
[5] N.J. Krogan et al., "Global Landscape of Protein Complexes in the Yeast Saccharomyces Cerevisiae," Nature, vol. 440, no. 7084, pp. 637-643, 2006.
[6] S.R. Collins, P. Kemmeren, X.C. Zhao, J.F. Greenblatth, F. Spencerg, F.C.P. Holstegee, J.S. Weissman, and N.J. Krogan, "Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces Cerevisiae," Molecular and Cellular Proteomics, vol. 6, no. 3, pp. 439-450, 2007.
[7] Y. Qi, F. Balem, C. Faloutsos, J. Klein-Seetharaman, and Z. Bar-Joseph, "Protein Complex Identification by Supervised Graph Local Clustering," Bioinformatics, vol. 24, no. 13, pp. i250-i258, 2008.
[8] L. Shi, X. Lei, and A. Zhang, "Protein Complex Detection with Semi-Supervised Learning in Protein Interaction Networks," Proteome Science, vol. 9, no. Suppl. 1, p. S5, 2011.
[9] A.J. Enright, S.V. Dongen, and C.A. Ouzounis, "An Efficient Algorithm for Large-Scale Detection of Protein Families," Nucleic Acids Research, vol. 30, no. 7, pp. 1575-1584, 2002.
[10] A.D. King, N. Pržulj, and I. Jurisica, "Protein Complex Prediction via Cost-Based Clustering," Bioinformatics, vol. 20, no. 17, pp. 3013-3020, 2004.
[11] G.D. Bader and C.W. Hogue, "An Automated Method for Finding Molecular Complexes in Large Protein Interaction Networks," BMC Bioinformatics, vol. 4, no. 1,article 2, 2003.
[12] B. Adamcsek, G. Palla, I.J. Farkas, I. Derényi, and T. Vicsek, "Cfinder: Locating Cliques and Overlapping Modules in Biological Networks," Bioinformatics, vol. 22, no. 8, pp. 1021-1023, 2006.
[13] Y.R. Cho, W. Hwang, M. Ramanathan, and A. Zhang, "Semantic Integration to Identify Overlapping Functional Modules in Protein Interaction Networks," BMC Bioinformatics, vol. 8, no. 1,article 265, 2007.
[14] D. Greene, G. Cagney, N. Krogan, and P. Cunningham, "Ensemble Non-Negative Matrix Factorization Methods for Clustering Protein-Protein Interactions," Bioinformatics, vol. 24, no. 15, pp. 1722-1728, 2008.
[15] G. Liu, L. Wong, and H.N. Chua, "Complex Discovery from Weighted ppi Networks," Bioinformatics, vol. 25, no. 15, pp. 1891-1897, 2009.
[16] P. Jiang and M. Singh, "Spici: A Fast Clustering Algorithm for Large Biological Networks," Bioinformatics, vol. 26, no. 8, pp. 1105-1111, 2010.
[17] W.W. Lam and K.C. Chan, "Discovering Functional Interdependency Relationship in ppi Networks for Protein Complex Identification," IEEE Trans. Biomedical Eng., pp. 1-1, 2010, doi.10.1109/TBME.2010.2093524.
[18] X. Liu, J. Li, and L. Wang, "Modeling Protein Interacting Groups by Quasi-Bicliques: Complexity, Algorithm, and Application," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 2, pp. 354-364, Apr.-June 2010.
[19] K. Rhrissorrakrai and K.C. Gunsalus, "Mine: Module Identification in Networks," BMC Bioinformatics, vol. 12, no. 1,article 192, 2011.
[20] J. Wang, M. Li, J. Chen, and Y. Pan, "A Fast Hierarchical Clustering Algorithm for Functional Modules Discovery in Protein Interaction Networks," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 3, pp. 607-620, May/June 2011.
[21] J. Feng, R. Jiang, and T. Jiang, "A Max-Flow Based Approach to the Identification of Protein Complexes Using Protein Interaction and Microarray Data," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 3, pp. 621-634, May/June 2011.
[22] R. Sharan, I. Ulitsky, and R. Shamir, "Network-Based Prediction of Protein Function," Molecular Systems Biology, vol. 3, article 88, 2007.
[23] X. Li, M. Wu, C.K. Kwoh, and S.K. Ng, "Computational Approaches for Detecting Protein Complexes from Protein Interaction Networks: A Survey," BMC Genomics, vol. 11, no. Suppl 1, p. S3, 2010.
[24] X.F. Zhang and D.Q. Dai, "A Framework for Incorporating Functional Inter-Relationships into Protein Function Prediction Algorithms," submitted for publication to IEEE/ACM Trans. Computational Biology and Bioinformatics, 2011, doi:10.1109/TCBB.2011.148.
[25] F. Luo, Y. Yang, C.F. Chen, R. Chang, J. Zhou, and R.H. Scheuermann, "Modular Organization of Protein Interaction Networks," Bioinformatics, vol. 23, no. 2, pp. 207-214, 2007.
[26] A. Barabási and R. Albert, "Emergence of Scaling in Random Networks," Science, vol. 286, no. 5439, pp. 509-512, 1999.
[27] R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal, "Stochastic Models for the Web Graph," Proc. 41st Ann. Symp. Foundations of Computer Science, pp. 57-65, 2000.
[28] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, "Network Motifs: Simple Building Blocks of Complex Networks," Science, vol. 298, no. 5594, pp. 824-827, 2002.
[29] N. Pržulj, D. Corneil, and I. Jurisica, "Modeling Interactome: Scale-Free or Geometric?," Bioinformatics, vol. 20, no. 18, pp. 3508-3515, 2004.
[30] F. Hormozdiari, P. Berenbrink, N. Pržulj, and S. Sahinalp, "Not All Scale-Free Networks Are Born Equal: The Role of the Seed Graph in ppi Network Evolution," PLoS Computational Biology, vol. 3, no. 7, p. e118, 2007.
[31] D. Higham, M. Rašajski, and N. Pržulj, "Fitting a Geometric Graph to a Protein-Protein Interaction Network," Bioinformatics, vol. 24, no. 8, pp. 1093-1099, 2008.
[32] R. Schweiger, M. Linial, and N. Linial, "Generative Probabilistic Models for Protein-Protein Interaction Networks—the Biclique Perspective," Bioinformatics, vol. 27, no. 13, pp. i142-i148, 2011.
[33] Z. Saul and V. Filkov, "Exploring Biological Network Structure Using Exponential Random Graph Models," Bioinformatics, vol. 23, no. 19, pp. 2604-3611, 2007.
[34] O. Kuchaiev, M. Rašajski, D. Higham, and N. Pržulj, "Geometric De-Noising of Protein-Protein Interaction Networks," PLoS Computational Biology, vol. 5, no. 8, p. e1000454, 2009.
[35] J. Ranola, S. Ahn, M. Sehl, D. Smith, and K. Lange, "A Poisson Model for Random Multigraphs," Bioinformatics, vol. 26, no. 16, pp. 2004-2011, 2010.
[36] G. Robins, P. Pattison, Y. Kalish, and D. Lusher, "An Introduction to Exponential Random Graph (p∗) Models for Social Networks," Social Networks, vol. 29, no. 2, pp. 173-191, 2007.
[37] P. Erdös and A. Rényi, "On the Evolution of Random Graphs," Publications Math. Inst. of Hungarian Academy of Science, vol. 5, pp. 17-61, 1960.
[38] M. Newman and E. Leicht, "Mixture Models and Exploratory Analysis in Networks," Proc. Nat'l Academy of Sciences USA, vol. 104, no. 23, pp. 9564-9569, 2007.
[39] E.M. Airoldi, D.M. Blei, S.E. Fienberg, and E.P. Xing, "Mixed Membership Stochastic Blockmodels," The J. Machine Learning Research, vol. 9, pp. 1981-2014, 2008.
[40] J. Fowler, C. Dawes, and N. Christakis, "Model of Genetic Variation in Human Social Networks," Proc. Nat'l Academy of Sciences USA, vol. 106, no. 6, pp. 1720-1724, 2009.
[41] G. Palla, L. Lovász, and T. Vicsek, "Multifractal Network Generator," Proc. Nat'l Academy of Sciences USA, vol. 107, no. 17, pp. 7640-7645, 2010.
[42] P. Bickel and A. Chen, "A Nonparametric View of Network Models and Newman-Girvan and Other Modularities," Proc. Nat'l Academy of Sciences USA, vol. 106, no. 50, pp. 21068-21073, 2009.
[43] B. Karrer and M. Newman, "Stochastic Blockmodels and Community Structure in Networks," Physical Rev. E, vol. 83, no. 1, p. 016107, 2011.
[44] Y. Zhao, E. Levina, and J. Zhu, "Community Extraction for Social Networks," Proc. Nat'l Academy of Sciences USA, vol. 108, no. 18, pp. 7321-7326, 2011.
[45] S. Fortunato, "Community Detection in Graphs," Physics Reports, vol. 486, nos. 3-5, pp. 75-174, 2010.
[46] Y.Y. Ahn, J.P. Bagrow, and S. Lehmann, "Link Communities Reveal Multiscale Complexity in Networks," Nature, vol. 466, no. 7307, pp. 761-764, 2010.
[47] B. Ball, B. Karrer, and M.E.J. Newman, "Efficient and Principled Method for Detecting Communities in Networks," Physical Rev. E, vol. 84, no. 3, p. 036103, 2011.
[48] M. Belkin, P. Niyogi, V. Sindhwani, and P. Bartlett, "Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples," The J. Machine Learning Research, vol. 7, pp. 2399-2434, 2006.
[49] O. Chapelle, B. Schölkopf, and A. Zien, Semi-Supervised Learning. MIT Press, 2006.
[50] D.R. Hunter and K. Lange, "A Tutorial on mm Algorithms," The Am. Statistician, vol. 58, no. 1, pp. 30-37, 2004.
[51] J.M. Cherry et al., "SGD: Saccharomyces Genome Database," Nucleic Acids Research, vol. 26, no. 1, pp. 73-79, 1998.
[52] M.S. Cline et al., "Integration of Biological Networks and Gene Expression Data Using Cytoscape," Nature Protocols, vol. 2, no. 10, pp. 2366-2382, 2007.
[53] P. Hoyer, "Non-Negative Sparse Coding," Proc. 12th IEEE Workshop Neural Networks for Signal Processing, pp. 557-565, 2002.
[54] X. Chen, Y. Qi, B. Bai, Q. Lin, and J.G. Carbonell, "Sparse Latent Semantic Analysis," Proc. SIAM Int'l Conf. Data Mining (SDM), 2011.
[55] R. Tibshirani, "Regression Shrinkage and Selection via the Lasso," J. Royal Statistical Soc. Series B (Methodological), vol. 58, no. 1, pp. 267-288, 1996.
[56] H. Zou and T. Hastie, "Regularization and Variable Selection via the Elastic Net," J. Royal Statistical Soc.: Series B (Statistical Methodology), vol. 67, no. 2, pp. 301-320, 2005.
[57] D.D. Lee and H.S. Seung, "Algorithms for Non-Negative Matrix Factorization," Advances in Neural Information Processing Systems, vol. 13, pp. 556-562, 2001.
[58] D.D. Lee and H.S. Seung, "Learning the Parts of Objects by Non-Negative Matrix Factorization," Nature, vol. 401, no. 6755, pp. 788-791, 1999.
[59] H. Kuhn and A. Tucker, "Nonlinear Programming," Proc. Second Berkeley Symp. Math. Statistics and Probability, vol. 1, pp. 481-492, 1951.
[60] C. Ding, X. He, and H. Simon, "On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering," Proc. SIAM Int'l Conf. Data Mining, pp. 606-610, 2005.
[61] D. Cai, X. He, J. Han, and T. Huang, "Graph Regularized Nonnegative Matrix Factorization for Data Representation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp. 1548-1560, Aug. 2011.
[62] V. Tan and C. Févotte, "Automatic Relevance Determination in Nonnegative Matrix Factorization," SPARS '09: Proc. Signal Processing with Adaptive Sparse Structured Representations, pp. 1-19, 2009.
[63] Y. Fang, W. Benjamin, M. Sun, and K. Ramani, "Global Geometric Affinity for Revealing High Fidelity Protein Interaction Network," PloS One, vol. 6, no. 5, p. e19349, 2011.
[64] H. Mewes et al., "Mips: Analysis and Annotation of Proteins from Whole Genomes," Nucleic Acids Research, vol. 32, no. suppl 1, pp. D41-D44, 2004.
[65] S. Brohée and J. Van Helden, "Evaluation of Clustering Algorithms for Protein-Protein Interaction Networks," BMC Bioinformatics, vol. 7, no. 1,article 488, 2006.
[66] J. Song and M. Singh, "How and When Should Interactome-Derived Clusters be Used to Predict Functional Modules and Protein Function?," Bioinformatics, vol. 25, no. 23, pp. 3143-3150, 2009.
[67] S. Fortunato and M. Barthélemy, "Resolution Limit in Community Detection," Proc. Nat'l Academy of Sciences USA, vol. 104, no. 1, pp. 36-41, 2007.
[68] E. Boyle, S. Weng, J. Gollub, H. Jin, D. Botstein, J. Cherry, and G. Sherlock, "Go:: Termfinder Open Source Software for Accessing Gene Ontology Information and Finding Significantly Enriched Gene Ontology Terms Associated with a List of Genes," Bioinformatics, vol. 20, no. 18, pp. 3710-3715, 2004.
[69] R. Hatakeyama, M. Kamiya, T. Takahara, and T. Maeda, "Endocytosis of the Aspartic Acid/Glutamic Acid Transporter dip5 Is Triggered by Substrate-Dependent Recruitment of the rsp5 Ubiquitin Ligase via the Arrestin-Like Protein aly2," Molecular and Cellular Biology, vol. 30, no. 24, pp. 5598-5607, 2010.
[70] R. Dunn and L. Hicke, "Domains of the rsp5 Ubiquitin-Protein Ligase Required for Receptor-Mediated and Fluid-Phase Endocytosis," Molecular Biology of the Cell, vol. 12, no. 2, pp. 421-435, 2001.
[71] A. Ruiz, A. González, I. Munoz, R. Serrano, J. Abrie, E. Strauss, and J. Arino, "Moonlighting Proteins hal3 and vhs3 form a Heteromeric ppcdc with ykl088w in Yeast Coa Biosynthesis," Nature Chemical Biology, vol. 5, no. 12, pp. 920-928, 2009.
[72] R. Tatusov, M. Galperin, D. Natale, and E. Koonin, "The COG Database: A Tool for Genome-Scale Analysis of Protein Functions and Evolution," Nucleic Acids Research, vol. 28, no. 1, pp. 33-36, 2000.

Index Terms:
proteins,biochemistry,biology computing,exponential distribution,genomics,molecular biophysics,physiological models,competing algorithms,protein complexes discovery,protein-protein interaction data,regularized sparse generative network model,detecting protein complexes,postgenome era,computational algorithms,traditional algorithms,protein-protein interaction networks,density-based methods,peripheral proteins,generative network model,generation processing,Laplacian regularizer,protein complexes identification,exponential distribution,Proteins,Communities,Biological system modeling,RNA,Polymers,Exponential distribution,peripheral protein.,Protein complex,protein-protein interaction network,generative network model,regularization method,overlapping complex
Citation:
Dao-Qing Dai, Xiao-Fei Zhang, Xiao-Xin Li, "Protein Complexes Discovery Based on Protein-Protein Interaction Data via a Regularized Sparse Generative Network Model," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 3, pp. 857-870, May-June 2012, doi:10.1109/TCBB.2012.20
Usage of this product signifies your acceptance of the Terms of Use.