This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Max-Flow-Based Approach to the Identification of Protein Complexes Using Protein Interaction and Microarray Data
May/June 2011 (vol. 8 no. 3)
pp. 621-634
Jianxing Feng, Tsinghua Univ., Beijing
Rui Jiang, Tsinghua University, Beijing
Tao Jiang, University of California, Riverside
The emergence of high-throughput technologies leads to abundant protein-protein interaction (PPI) data and microarray gene expression profiles, and provides a great opportunity for the identification of novel protein complexes using computational methods. By combining these two types of data, we propose a novel Graph Fragmentation Algorithm (GFA) for protein complex identification. Adapted from a classical max-flow algorithm for finding the (weighted) densest subgraphs, GFA first finds large (weighted) dense subgraphs in a protein-protein interaction network, and then, breaks each such subgraph into fragments iteratively by weighting its nodes appropriately in terms of their corresponding log-fold changes in the microarray data, until the fragment subgraphs are sufficiently small. Our tests on three widely used protein-protein interaction data sets and comparisons with several latest methods for protein complex identification demonstrate the strong performance of our method in predicting novel protein complexes in terms of its specificity and efficiency. Given the high specificity (or precision) that our method has achieved, we conjecture that our prediction results imply more than 200 novel protein complexes.

[1] P. Uetz, L. Giot, G. Cagney, T.A. Mansfield, R.S. Judson, J.R. Knight, D. Lockshon, V. Narayan, M. Srinivasan, P. Pochart, A.Q. Emili, Y. Li, B. Godwin, D. Conover, T. Kalbfleisch, G. Vijayadamodar, M. Yang, M. Johnston, S. Fields, and J.M. Rothberg, "A Comprehensive Analysis of Protein-Protein Interactions in Saccharomyces Cerevisiae," Nature, vol. 403, no. 6770, pp. 623-627, 2000.
[2] T. Ito, K. Tashiro, S. Muta, R. Ozawa, T. Chiba, M. Nishizawa, K. Yamamoto, S. Kuhara, and Y. Sakaki, "Toward a Protein-Protein Interaction Map of the Budding Yeast: A Comprehensive System to Examine Two-Hybrid Interactions in All Possible Combinations between the Yeast Proteins," Proc. Nat'l Academy of Science USA, vol. 97, no. 3, pp. 1143-1147, 2000.
[3] T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki, "A Comprehensive Two-Hybrid Analysis to Explore the Yeast Protein Interactome," Proc. Nat'l Academy of Science USA, vol. 98, no. 8, pp. 4569-4574, 2001.
[4] Y. Ho, A. Gruhler, A. Heilbut, G.D. Bader, L. Moore, S.-L. Adams, A. Millar, P. Taylor, K. Bennett, K. Boutilier, L. Yang, C. Wolting, I. Donaldson, S. Schandorff, J. Shewnarane, M. Vo, J. Taggart, M. Goudreault, B. Muskat, C. Alfarano, D. Dewar, Z. Lin, K. Michalickova, A.R. Willems, H. Sassi, P.A. Nielsen, K.J. Rasmussen, J.R. Andersen, L.E. Johansen, L.H. Hansen, H. Jespersen, A. Podtelejnikov, E. Nielsen, J. Crawford, V. Poulsen, B.D. Sørensen, J. Matthiesen, R.C. Hendrickson, F. Gleeson, T. Pawson, M.F. Moran, D. Durocher, M. Mann, C.W.V. Hogue, D. Figeys, and M. Tyers, "Systematic Identification of Protein Complexes in Saccharomyces Cerevisiae by Mass Spectrometry," Nature, vol. 415, no. 6868, pp. 180-183, 2002.
[5] A.C. Gavin, M. Bösche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J. Schultz, J.M. Rick, A.M. Michon, C.M. Cruciat, M. Remor, C. Höfert, M. Schelder, M. Brajenovic, H. Ruffner, A. Merino, K. Klein, M. Hudak, D. Dickson, T. Rudi, V. Gnau, A. Bauch, S. Bastuck, B. Huhse, C. Leutwein, M.A. Heurtier, R.R. Copley, A. Edelmann, E. Querfurth, V. Rybin, G. Drewes, M. Raida, T. Bouwmeester, P. Bork, B. Seraphin, B. Kuster, G. Neubauer, and G.S. Furga, "Functional Organization of the Yeast Proteome by Systematic Analysis of Protein Complexes," Nature, vol. 415, no. 6868, pp. 141-147, 2002.
[6] L. Salwinski, C.S. Miller, A.J. Smith, F.K. Pettit, J.U. Bowie, and D. Eisenberg, "The Database of Interacting Proteins: 2004 Update," Nucleic Acids Research, vol. 32, pp. D449-D451, 2004.
[7] U. Güldener, M. Münsterkötter, G. Kastenmüller, N. Strack, J.v. Helden, C. Lemer, J. Richelles, S.J. Wodak, J. García-Martínez, J.E. Pérez-Ortín, H. Michael, A. Kaps, E. Talla, B. Dujon, B. André, J.L. Souciet, J.D. Montigny, E. Bon, C. Gaillardin, and H.W. Mewes, "CYGD: The Comprehensive Yeast Genome Database," Nucleic Acids Research, vol. 33, pp. D364-D368, 2005.
[8] C. Stark, B.J. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, and M. Tyers, "BioGRID: A General Repository for Interaction Datasets," Nucleic Acids Research, vol. 34, pp. D535-D539, 2006.
[9] U. Stelzl et al., "A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome," Cell, vol. 122, no. 6, pp. 957-968, 2005.
[10] T. Barrett, D.B. Troup, S.E. Wilhite, P. Ledoux, D. Rudnev, C. Evangelista, I.F. Kim, A. Soboleva, M. Tomashevsky, and R. Edgar, "NCBI GEO: Mining Tens of Millions of Expression Profiles-Database and Tools Update," Nucleic Acids Research, vol. 35, pp. D760-D765, 2007.
[11] G.D. Bader and C.W.V. Hogue, "An Automated Method for Finding Molecular Complexes in Large Protein Interaction Networks," BMC Bioinformatics, vol. 4, article no. 2, 2003.
[12] V. Spirin and L.A. Mirny, "Protein Complexes and Functional Modules in Molecular Networks," Proc. Nat'l Academy of Science USA, vol. 100, no. 21, pp. 12123-12128, 2003.
[13] P.J. Pei and A.D. Zhang, "A 'Seed-Refine' Algorithm for Detecting Protein Complexes from Protein Interaction Data," IEEE Trans. Nanobioscience, vol. 6, no. 1, pp. 43-50, Mar. 2007.
[14] X.L. Li, S.H. Tan, C.S. Foo, and S.K. Ng, "Interaction Graph Mining for Protein Complexes Using Local Clique Merging," Genome Informatics, vol. 16, no. 2, pp. 260-269, 2005.
[15] X.L. Li, C.S. Foo, and S.K. Ng, "Discovering Protein Complexes in Dense Reliable Neighborhoods of Protein Interaction Networks," Proc. Computational Systems Bioinformatics Conf., vol. 6, pp. 157-168, 2007.
[16] A.H.Y. Tong, B. Drees, G. Nardelli, G.D. Bader, B. Brannetti, L. Castagnoli, M. Evangelista, S. Ferracuti, B. Nelson, S. Paoluzi, M. Quondam, A. Zucconi, C.W.V. Hogue, S. Fields, C. Boone, and G. Cesareni, "A Combined Experimental and Computational Strategy to Dene Protein Interaction Networks for Peptide Recognition Modules," Science, vol. 295, no. 5553, pp. 321-324, 2002.
[17] B. Andreopoulos, A. An, X. Wang, M. Faloutsos, and M. Schroeder, "Clustering by Common Friends Finds Locally Significant Proteins Mediating Modules," Bioinformatics, vol. 23, no. 9, pp. 1124-1131, 2007.
[18] A.D. King, N. Pržulj, and I. Jurisica, "Protein Complex Prediction via Cost-Based Clustering," Bioinformatics, vol. 20, no. 17, pp. 3013-3020, 2004.
[19] D. Bu, Y. Zhao, L. Cai, H. Xue, X. Zhu, H. Lu, J. Zhang, S. Sun, L. Ling, N. Zhang, G. Li, and R. Chen, "Topological Structure Analysis of the Protein-Protein Interaction Network in Budding Yeast," Nucleic Acids Research, vol. 31, no. 9, pp. 2443-2450, 2003.
[20] C. Wang, C. Ding, Q. Yang, and S.R. Holbrook, "Consistent Dissection of the Protein Interaction Network by Combining Global and Local Metrics," Genome Biology, vol. 8, no. 12, p. R271, 2007.
[21] E. Segal, H. Wang, and D. Koller, "Discovering Molecular Pathways from Protein Interaction and Gene Expression Data," Bioinformatics, vol. 19, pp. i264-271, 2003.
[22] J. Chen and B. Yuan, "Detecting Functional Modules in the Yeast Protein-Protein Interaction Network," Bioinformatics, vol. 22, no. 18, pp. 2283-2290, 2006.
[23] T. Ideker, O. Ozier, B. Schwikowski, and A.F. Siegel, "Discovering Regulatory and Signalling Circuits in Molecular Interaction Networks," Bioinformatics, vol. 18, pp. S233-S240, 2002.
[24] Z. Guo, Y. Li, X. Gong, C. Yao, W. Ma, D. Wang, Y. Li, J. Zhu, M. Zhang, D. Yang, and J. Wang, "Edge-Based Scoring and Searching Method for Identifying Condition-Responsive Protein-Protein Interaction Sub-Network," Bioinformatics, vol. 23, article  no. 16, pp. 2121-2128, 2007.
[25] I. Ulitsky and R. Shamir, "Identification of Functional Modules Using Network Topology and High-Throughput Data," BMC Systems Biology, vol. 1, no. 8, 2007.
[26] I. Ulitsky and R. Shamir, "Identifying Functional Modules Using Expression Profiles and Confidence-Scored Protein Interactions," Bioinformatics, vol. 25, no. 9, pp. 1158-1164, 2009.
[27] R. Jansen, D. Greenbaum, and M. Gerstein, "Relating Whole-Genome Expression Data with Protein-Protein Interactions," Genome Research, vol. 12, no. 1, pp. 37-46, 2002.
[28] N. Simonis, D. Gonze, C. Orsi, J. van Helden, and S.J. Wodak, "Modularity of the Transcriptional Response of Protein Complexes in Yeast," J. Molecular Biology, vol. 363, no. 2, pp. 589-610, 2006.
[29] S. Tornow and H.W. Mewes, "Functional Modules by Relating Protein Interaction Networks and Gene Expression," Nucleic Acids Research, vol. 31, no. 21, pp. 6283-6289, 2003.
[30] R. Sharan, T. Ideker, B.P. Kelley, R. Shamir, and R.M. Karp, "Identification of Protein Complexes by Comparative Analysis of Yeast and Bacterial Protein Interaction Data," J. Computational Biology, vol. 12, no. 6, pp. 835-846, 2005.
[31] E. Hirsh and R. Sharan, "Identification of Conserved Protein Complexes Based on a Model of Protein Network Evolution," Bioinformatics, vol. 23, no. 2, pp. e170-e176, 2007.
[32] J. Flannick, A. Novak, B.S. Srinivasan, H.H. McAdams, and S. Batzoglou, "Graemlin: General and Robust Alignment of Multiple Large Interaction Networks," Genome Research, vol. 16, no. 9, pp. 1169-1181, 2006.
[33] R. Sharan and T. Ideker, "Modeling Cellular Machinery through Biological Network Comparison," Nature Biotechnology, vol. 24, no. 4, pp. 427-433, 2006.
[34] Z. Li, S. Zhang, Y. Wang, X.S. Zhang, and L. Chen, "Alignment of Molecular Networks by Integer Quadratic Programming," Bioinformatics, vol. 23, no. 13, pp. 1631-1639, 2007.
[35] G. Gallo, M.D. Grigoriadis, and R.E. Tarjan, "A Fast Parametric Maximum Flow Algorithm and Applications," SIAM J. Computing, vol. 18, no. 1, pp. 30-55, 1989.
[36] S. van Dongen, "Graph Clustering by Flow Simulation," PhD dissertation, Univ. of Utrecht, 2000.
[37] G. Geva and R. Sharan, "Identification of Protein Complexes from Co-Immunoprecipitation Data," Unpublished Manuscript, 2008.
[38] The Gene Ontology Consortium, "Gene Ontology: Tool for the Unification of Biology," Nature Genetics, vol. 25, no. 1, pp. 25-29, 2000.
[39] M.R. Garey and D.S. Johnson, Computers and Intractability : A Guide to the Theory of NP-Completeness. W.H. Freeman, 1979.
[40] S. Bauer, S. Grossmann, M. Vingron, and P.N. Robinson, "Ontologizer 2.0—A Multifunctional Tool for GO Term Enrichment Analysis and Data Exploration," Bioinformatics, vol. 24, no. 14, pp. 1650-1651, 2008.
[41] T. Beissbarth and T.P. Speed, "Gostat: Find Statistically Overrepresented Gene Ontologies within a Group of Genes," Bioinformatics, vol. 20, no. 9, pp. 1464-1465, 2004.
[42] S. Zhong, K.F. Storch, O. Lipan, M.C.J. Kao, C.J. Weitz, and W.H. Wong, "Gosurfer: A Graphical Interactive Tool for Comparative Analysis of Large Gene Sets in Gene Ontology Space," Applied Bioinformatics, vol. 3, no. 4, pp. 261-264, 2004.
[43] R. Shamir, A.M. Katz, A. Tanay, C. Linhart, I. Steinfeld, R. Sharan, Y. Shiloh, and R. Elkon, "Expander—An Integrative Program Suite for Microarray Data Analysis," BMC Bioinformatics, vol. 6, article no. 232, 2005.
[44] R.G. Miller, Simultaneous Statistical Inference, second ed. Springer Verlag, 1981.
[45] D. Kressler, P. Linder, and J. de La Cruz, "Protein Trans-Acting Factors Involved in Ribosome Biogenesis in Saccharomyces Cerevisiae," Molecular Cellular Biology, vol. 19, no. 12, pp. 7897-7912, 1999.
[46] S. Brohée and J. van Helden, "Evaluation of Clustering Algorithms for Protein-Protein Interaction Networks," BMC Bioinformatics, vol. 7, article no. 488, 2006.
[47] T. Ideker, V. Thorsson, A.F. Siegel, and L.E. Hood, "Testing for Differentially-Expressed Genes by Maximum-Likelihood Analysis of Microarray Data," J. Computational Biology, vol. 7, no. 6, pp. 805-817, 2000.
[48] S.R. Collins, P. Kemmeren, X.-C. Zhao, J.F. Greenblatt, F. Spencer, F.C.P. Holstege, J.S. Weissman, and N.J. Krogan, "Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces Cerevisiae," Molecular and Cellular Proteomics, vol. 6, no. 3, pp. 439-450, 2007.
[49] A.P. Gasch, M. Huang, S. Metzner, D. Botstein, S.J. Elledge, and P.O. Brown, "Genomic Expression Responses to DNA-Damaging Agents and the Regulatory Role of the Yeast ATR Homolog Mec1p," Molecular Biology of the Cell, vol. 12, no. 10, pp. 2987-3003, 2001.

Index Terms:
Protein complex, protein-protein interaction network, microarray, dense subgraph, maximum network flow, efficient algorithm.
Citation:
Jianxing Feng, Rui Jiang, Tao Jiang, "A Max-Flow-Based Approach to the Identification of Protein Complexes Using Protein Interaction and Microarray Data," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 3, pp. 621-634, May-June 2011, doi:10.1109/TCBB.2010.78
Usage of this product signifies your acceptance of the Terms of Use.