This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Predicting Protein Function by Multi-Label Correlated Semi-Supervised Learning
July-Aug. 2012 (vol. 9 no. 4)
pp. 1059-1069
L. J. McQuay, Pharmacoepidemiology & Risk Manage. Div., RTI Health Solutions, Research Triangle Park, NC, USA
J. Q. Jiang, Dept. of Inf. Syst., City Univ. of Hong Kong, Kowloon, China
Assigning biological functions to uncharacterized proteins is a fundamental problem in the postgenomic era. The increasing availability of large amounts of data on protein-protein interactions (PPIs) has led to the emergence of a considerable number of computational methods for determining protein function in the context of a network. These algorithms, however, treat each functional class in isolation and thereby often suffer from the difficulty of the scarcity of labeled data. In reality, different functional classes are naturally dependent on one another. We propose a new algorithm, Multi-label Correlated Semi-supervised Learning (MCSL), to incorporate the intrinsic correlations among functional classes into protein function prediction by leveraging the relationships provided by the PPI network and the functional class network. The guiding intuition is that the classification function should be sufficiently smooth on subgraphs where the respective topologies of these two networks are a good match. We encode this intuition as regularized learning with intraclass and interclass consistency, which can be understood as an extension of the graph-based learning with local and global consistency (LGC) method. Cross validation on the yeast proteome illustrates that MCSL consistently outperforms several state-of-the-art methods. Most notably, it effectively overcomes the problem associated with scarcity of label data. The supplementary files are freely available at http://sites.google.com/site/csaijiang/MCSL.

[1] M. Belkin, I. Matveeva, and P. Niyogi, "Regularization and Semi-Supervised Learning on Large Graphs," Proc. Conf. Learning Theory (COLT '04), pp. 624-638, 2004.
[2] K.M. Borgwardt, C.S. Ong, S. Schonauer, S.V.N. Vishwanathan, A.J. Smola, and H.P. Kriegel, "Protein Function Prediction via Graph Kernels," Bioinformatics, vol. 21, pp. I47-I56, June 2005.
[3] B.J. Breitkreutz et al. "The BioGRID Interaction Database: 2008 Update," Nucleic Acids Research, vol. 36 (Database issue), pp. D637-D640, 2008.
[4] G. Chen, Y. Song, F. Wang, and C. Zhang, "Semi-Supervised Multi-Label Learning by Solving a Sylvester Equation," Proc. SIAM Int'l Conf. Data Mining, pp. 410-419, 2008.
[5] H.N. Chua, W.K. Sung, and L. Wong, "Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions," Bioinformatics, vol. 22, pp. 1623-1630, July 2006.
[6] H.N. Chua, W.K. Sung, and L. Wong, "Using Indirect Protein Interactions for the Prediction Ofgeneontology Functions," BMC Bioinformatics, vol. 8, no. Suppl. 4, article S8, 2007.
[7] F. Chung and S.-T. Yau, "Discrete Green's Functions," J. Combinatorial Theory, Series A, vol. 91, pp. 191-241, 2000.
[8] R. Edgar, M. Domrachev, and A.E. Lash, "Gene Expression Omnibus: NCBI Gene Expression and Hybridization Array Data Repository," Nucleic Acids Research, vol. 30, pp. 207-210, 2002.
[9] J. Flannick et al. "Automatic Parameter Learning for Multiple Network Alignment," Proc. Conf. Research in Computational Molecular Biology (RECOMB '08), pp. 214-231, 2008.
[10] V. Freschi, "A Graph-Based Semi-Supervised Algorithm for Protein Function Prediction from Interaction Maps," Learning and Intelligent Optimization, pp. 249-258, Springer-Verlag, 2009.
[11] A.C. Gavin et al. "Functional Organization of the Yeast Proteome by Systematic Analysis of Protein Complexes," Nature, vol. 415, pp. 141-147, 2002.
[12] C.T. Harbison et al. "Transcriptional Regulatory Code of a Eukaryotic Genome," Nature, vol. 431, pp. 99-104, 2004.
[13] H. Hishigaki, K. Nakai, T. Ono, A. Tanigami, and T. Takagi, "Assessment of Prediction Accuracy of Protein Function from Protein-Protein Interaction Data," Yeast, vol. 18, pp. 523-531, 2001.
[14] T. Ito, K. Tashiro, S. Muta, R. Ozawa, T. Chiba, M. Nishizawa, K. Yamamoto, S. Kuhara, and Y. Sakaki, "Toward a Protein-Protein Interaction Map of the Budding Yeast: A Comprehensive System to Examine Two-Hybrid Interactions in All Possible Combinations between the Yeast Proteins," Proc. Nat'l Academy of Sciences USA, vol. 97, pp. 1143-1147, 2000.
[15] L.J. Jensen, R. Gupta, H.H. Staerfeldt, and S. Brunak, "Prediction of Human Protein Function According to Gene Ontology Categories," Bioinformatics, vol. 19, pp. 635-642, Mar. 2003.
[16] U. Karaoz et al. "Whole-Genome Annotation by Using Evidence Integration in Functional-Linkage Networks," Proc. Nat'l Academy of Sciences USA, vol. 101, pp. 2888-2893, 2004.
[17] M. Kirac, G. Ozsoyoglu, and J. Yang, "Annotating Proteins by Mining Protein Interaction Networks," Bioinformatics, vol. 22, pp. e260-e270, 2006.
[18] M. Kirac and G. Ozsoyoglu, "Protein Function Prediction Based on Patterns in Biological Networks," Proc. Ann. Int'l Conf. Research in Computational Molecular Biology, pp. 197-213, 2008.
[19] G.R.G. Lanckriet, T.D. Bie, N. Cristianini, M.I. Jordan, and W.S. Noble, "A Statistical Framework for Genomic Data Fusion," Bioinformatics, vol. 20, pp. 2626-2635, Nov. 2004.
[20] M.L. Mayer and P. Hieter, "Protein Networks-Built by Association," Nature Biotechnology, vol. 18, pp. 1242-1243, Dec. 2000.
[21] H.W. Mewes, D. Frishman, U. Guldener, G. Mannhaupt, K. Mayer, M. Mokrejs, B. Morgenstern, M. Müsterköter, S. Rudd, and B. Weil, "MIPS: A Database for Genomes and Protein Sequences," Nucleic Acids Research, vol. 30, pp. 31-34, 2002.
[22] E. Nabieva, K. Jim, A. Agarwal, B. Chazelle, and M. Singh, "Whole-Proteome Prediction of Protein Function via Graph-Theoretic Analysis of Interaction Maps," Bioinformatics, vol. 21, no. Suppl 1, pp. i302-i310, 2005.
[23] M. Narayanan and R.M. Karp, "Comparing Protein Interaction Networks via a Graph Match-and-Split Algorithm," J. Computational Biology, vol. 14, no. 7, pp. 892-907, 2007.
[24] P. Pavlidis, J. Weston, J. Cai, and W.N. Grundy, "Gene Functional Classification from Heterogeneous Data," Proc. Fifth Ann. Int'l Conf. Computational Biology, pp. 249-255, 2001.
[25] M. Pellegrini, E.M. Marcotte, M.J. Thompson, D. Eisenberg, and T.O. Yeates, "Assigning Protein Functions by Comparative Genome Analysis: Protein Phylogenetic Profiles," Proc. Nat'l Academy of Sciences USA, vol. 96, pp. 4285-4288, 1999.
[26] J. Quackenbush, "Microarrays-Guilt by Association," Science, vol. 302, no. 5643, pp. 240-241, Oct. 2003.
[27] H.J. Schaeffer, A.D. Catling, S.T. Eblen, L.S. Collier, A. Krauss, and M.J. Weber, "MP1: A MEK Binding Partner that Enhances Enzymatic Activation of the MAP Kinase Cascade," Science, vol. 281, nos. 5383, pp. 1668-1671, Sept. 1998.
[28] B. Schwikowski, P. Uetz, and S. Fields, "A Network of Protein-Protein Interactions in Yeast," Nature Biotechnology, vol. 18, pp. 1257-1261, 2000.
[29] R. Sharan, I. Ulitsky, and R. Shamir, "Network-Based Prediction of Protein Function," Molecular Systems Biology 3, article 88, 2007.
[30] R. Singh, J. Xu, and B. Berger, "Pairwise Global Alignment of Protein Interaction Networks by Matching Neighborhood Topology," Proc. Conf. Research in Computational Molecular Biology (RECOMB '07), pp. 16-31, 2007.
[31] A.F. Smeaton et al. "Evaluation Campaigns and TRECVid," Proc. Eighth ACM Int'l Workshop Multimedia Information Retrieval, 2008.
[32] K. Tsuda and W.S. Noble, "Learning Kernels from Biological Networks by Maximizing Entropy," Bioinformatics, vol. 20, pp. i326-i333, 2004.
[33] H. Wang, H. Huang, and C. Ding, "Image Annotation Using Multi-Label Correlated Green's Function," Proc. IEEE Int'l Conf. Computer Vision, pp. 2029-2034, 2008.
[34] O. Vanunu and R. Sharan, "A Propagation-Based Algorithm for Inferring Gene-Disease Associations," Proc. German Conf. Bioinformatics, pp. 54-63, 2008.
[35] A. Vazquez, A. Flammini, A. Maritan, and A. Vespignani, "Global Protein Function Prediction from Protein-Protein Interaction Networks," Nature Biotechnology, vol. 21, pp. 697-700, 2003.
[36] A. Vinayagam, R. Konig, J. Moormann, F. Schubert, R. Eils, K.H. Glatting, and S. Suhai, "Applying Support Vector Machines for Gene Ontology Based Gene Function Prediction," BMC Bioinformatics, vol. 5, article 116, pp. 1-14, Aug. 2004.
[37] Y. Yamanishi, J.-P. Vert, and M. Kanehisa, "Protein Network Inference from Multiple Genomic Data: A Supervised Approach," Bioinformatics, vol. 20, pp. i363-i370, 2004.
[38] Z. Zha, T. Mei, J. Wang, Z. Wang, and X. Hua, "Graph-Based Semi-Supervised Learning with Multi-Label," Proc. IEEE Int'l Conf. Multimedia and Expo (ICME), pp. 1321-1324, 2008.
[39] X.M. Zhao, Y. Wang, L.N. Chen, and K. Aihara, "Gene Function Prediction Using Labeled and Unlabeled Data," BMC Bioinformatics, vol. 9, article 57, pp. 1-14, Jan. 2008.
[40] D. Zhou, O. Bousquet, T.N. Lal, J. Weston, and B. Scholkopf, "Learning with Local and Global Consistency," Advances in Neural Information Processing Systems, vol. 16, pp. 321-328, MIT Press, 2004.
[41] X. Zhu, Z. Ghahramani, and J.D. Lafferty, "Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions," Proc. 20th Int'l Conf. Machine Learning (ICML), pp. 912-919, 2003.
[42] X. Zhu, "Semi-Supervised Learning Literature Survey," Technical Report 1530, Dept. of Computer Sciences, Univ. of Wisconsin, Madison, 2005.

Index Terms:
proteins,bioinformatics,genetics,graph theory,learning (artificial intelligence),pattern classification,yeast proteome,protein function prediction,multilabel correlated semisupervised learning,biological function,postgenomic era,protein-protein interaction,intrinsic correlation,functional class network,classification function,subgraph,topology,regularized learning,intraclass consistency,interclass consistency,graph-based learning,local consistency method,global consistency method,cross validation,Proteins,Kernel,Prediction algorithms,Bioinformatics,Correlation,Computational biology,Electronic mail,functional class correlation.,Protein function prediction,semi-supervised learning,multi-label learning,kernel learning
Citation:
L. J. McQuay, J. Q. Jiang, "Predicting Protein Function by Multi-Label Correlated Semi-Supervised Learning," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1059-1069, July-Aug. 2012, doi:10.1109/TCBB.2011.156
Usage of this product signifies your acceptance of the Terms of Use.