This Article 
 Bibliographic References 
 Add to: 
A Framework for Incorporating Functional Interrelationships into Protein Function Prediction Algorithms
May-June 2012 (vol. 9 no. 3)
pp. 740-753
Xiao-Fei Zhang, Dept. of Math., Sun Yat-Sen Univ., Guangzhou, China
Dao-Qing Dai, Dept. of Math., Sun Yat-Sen Univ., Guangzhou, China
The functional annotation of proteins is one of the most important tasks in the post-genomic era. Although many computational approaches have been developed in recent years to predict protein function, most of these traditional algorithms do not take interrelationships among functional terms into account, such as different GO terms usually coannotate with some common proteins. In this study, we propose a new functional similarity measure in the form of Jaccard coefficient to quantify these interrelationships and also develop a framework for incorporating GO term similarity into protein function prediction process. The experimental results of cross-validation on S. cerevisiae and Homo sapiens data sets demonstrate that our method is able to improve the performance of protein function prediction. In addition, we find that small size terms associated with a few of proteins obtain more benefit than the large size ones when considering functional interrelationships. We also compare our similarity measure with other two widely used measures, and results indicate that when incorporated into function prediction algorithms, our proposed measure is more effective. Experiment results also illustrate that our algorithms outperform two previous competing algorithms, which also take functional interrelationships into account, in prediction accuracy. Finally, we show that our method is robust to annotations in the database which are not complete at present. These results give new insights about the importance of functional interrelationships in protein function prediction.

[1] D. Eisenberg, E.M. Marcotte, I. Xenarios, and T.O. Yeates, "Protein Function in the Post-Genomic Era," Nature, vol. 405, no. 6788, pp. 823-826, 2000.
[2] T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki, "A Comprehensive Two-Hybrid Analysis to Explore the Yeast Protein Interactome," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 8, pp. 4569-4574, 2001.
[3] R. Aebersold and M. Mann, "Mass Spectrometry-Based Proteomics," Nature, vol. 422, no. 6928, pp. 198-207, 2003.
[4] A.L. Barabási and Z.N. Oltvai, "Network Biology: Understanding the Cell's Functional Organization," Nature Rev. Genetics, vol. 5, no. 2, pp. 101-113, 2004.
[5] B. Schwikowski, P. Uetz, and S. Fields, "A Network of Protein-Protein Interactions in Yeast," Nature Biotechnology, vol. 18, no. 12, pp. 1257-1261, 2000.
[6] A. Vazquez, A. Flammini, A. Maritan, and A. Vespignani, "Global Protein Function Prediction from Protein-Protein Interaction Networks," Nature Biotechnology, vol. 21, no. 6, pp. 697-700, 2003.
[7] U. Karaoz, T.M. Murali, S. Letovsky, Y. Zheng, C. Ding, C.R. Cantor, and S. Kasif, "Whole-Genome Annotation by Using Evidence Integration in Functional-Linkage Networks," Proc. Nat'l Academy of Sciences USA, vol. 101, no. 9, pp. 2888-2893, 2004.
[8] M. Deng, Z. Tu, F. Sun, and T. Chen, "Mapping Gene Ontology to Proteins Based on Protein-Protein Interaction Data," Bioinformatics, vol. 20, no. 6, pp. 895-902, 2004.
[9] E. Nabieva, K. Jim, A. Agarwal, B. Chazelle, and M. Singh, "Whole-Proteome Prediction of Protein Function via Graph-Theoretic Analysis of Interaction Maps," Bioinformatics, vol. 21, no. 1, pp. 302-310, 2005.
[10] H.N. Chua, W.K. Sung, and L. Wong, "Exploiting Indirect Neighbours and Topological Weight to Predict Proteinc Function from Protein-Protein Interactions," Bioinformatics, vol. 22, no. 13, pp. 1623-1630, 2006.
[11] S. Mostafavi, D. Ray, D.W. Farley, C. Grouios, and Q. Morris, "GeneMANIA: A Real-Time Multiple Association Network Integration Algorithm for Predicting Gene Function," Genome Biology, vol. 9, no. Suppl 1, p. S4, 2008.
[12] A.A. Freitas, D.C. Wieser, and R. Apweiler, "On the Importance of Comprehensible Classification Models for Protein Function Prediction," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 1, pp. 172-182, Jan.-Mar. 2010.
[13] Y.R. Cho and A. Zhang, "Predicting Protein Function by Frequent Functional Association Pattern Mining in Protein Interaction Networks," IEEE Trans. Information Technology in Biomedicine, vol. 14, no. 1, pp. 30-36, Jan. 2010.
[14] P. Bogdanov and A.K. Singh, "Molecular Function Prediction Using Neighborhood Features," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 2, pp. 208-217, Apr.-June 2010.
[15] A. Mitrofanova, V. Pavlovic, and B. Mishra, "Prediction of Protein Functions with Gene Ontology and Inter-Species Protein Homology Data," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 3, pp. 775-784, May/June 2011.
[16] J.C. Jeong, X. Lin, and X.W. Chen, "On Position-Specific Scoring Matrix for Protein Function Prediction," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 2, pp. 308-315, Mar./Apr. 2011.
[17] G. Valentini, "True Path Rule Hierarchical Ensembles for Genome-Wide Gene Function Prediction," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 3, pp. 832-847, May/June 2011.
[18] R. Sharan, I. Ulitsky, and R. Shamir, "Network-Based Prediction of Protein Function," Molecular Systems Biology, vol. 3, article 88, 2007.
[19] P.I. Wang and E.M. Marcotte, "It's the Machine that Matters: Predicting Gene Function and Phenotype from Protein Networks," J. Proteomics, vol. 73, no. 11, pp. 2277-2289, 2010.
[20] J. Gillis and P. Pavlidis, "The Impact of Multifunctional Genes on "Guilt by Association" Analysis," PLoS ONE, vol. 6, no. 2, p. e17258, 2011.
[21] J.M. Cherry, C. Adler, C. Ball, S.A. Chervitz, S.S. Dwight, E.T. Hester, Y. Jia, G. Juvik, T.R. andMark Schroeder, S. Weng, and D. Botstein, "SGD: Saccharomyces Genome Database," Nucleic Acids Research, vol. 26, no. 1, pp. 73-79, 1998.
[22] M. Belkin, P. Niyogi, V. Sindhwani, and P. Bartlett, "Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples," The J. Machine Learning Research, vol. 7, pp. 2399-2434, 2006.
[23] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, second ed. Springer Verlag, 2009.
[24] X. Zhu, Z. Ghahramani, and J. Lafferty, "Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions," Proc. 20th Int'l Conf. Machine Learning, 2003.
[25] D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Schölkopf, "Learning with Local and Global Consistency," Advances in Neural Information Processing Systems, vol. 16, pp. 321-328, 2004.
[26] Z. Barutcuoglu, R. Schapire, and O. Troyanskaya, "Hierarchical Multi-Label Prediction of Gene Function," Bioinformatics, vol. 22, no. 7, pp. 830-836, 2006.
[27] S. Carroll and V. Pavlovic, "Protein Classification Using Probabilistic Chain Graphs and the Gene Ontology Structure," Bioinformatics, vol. 22, no. 15, pp. 1871-1878, 2006.
[28] G. Obozinski, G. Lanckriet, C. Grant, M.I. Jordan, and W.S. Noble, "Consistent Probabilistic Outputs for Protein Function Prediction," Genome Biology, vol. 9, no. Suppl 1, p. S6, 2008.
[29] S. Mostafavi and Q. Morris, "Using the Gene Ontology Hierarchy when Predicting Gene Function," Proc. 25th Conf. Uncertainty in Artificial Intelligence, pp. 419-427, 2009.
[30] A. Sokolov and A. Ben-Hur, "Hierarchical Classification of Gene Ontology Terms Using the Gostruct Method," J. Bioinformatics and Computational Biology, vol. 8, no. 2, pp. 357-376, 2010.
[31] O. King, R. Foulger, S. Dwight, J. White, and F. Roth, "Predicting Gene Function from Patterns of Annotation," Genome Research, vol. 13, no. 5, pp. 896-904, 2003.
[32] H. Yu, L. Gao, K. Tu, and Z. Guo, "Broadly Predicting Specific Gene Functions with Expression Similarity and Taxonomy Similarity," Gene, vol. 352, pp. 75-81, 2005.
[33] V. Pekar and S. Staab, "Taxonomy Learning: Factoring the Structure of a Taxonomy into a Semantic Classification Decision," Proc. 19th Int'l Conf. Computational Linguistics, 2002.
[34] Y. Tao, L. Sam, J. Li, C. Friedman, and Y.A. Lussier, "Information Theory Applied to the Sparse Gene Ontology Annotation Network to Predict Novel Gene Function," Bioinformatics, vol. 23, no. 13, pp. i529-i538, 2007.
[35] D. Lin, "An Information-Theoretic Definition of Similarity," Proc. 15th Int'l Conf. Machine Learning, pp. 296-304, 1998.
[36] G. Pandey, C.L. Myers, and V. Kumar, "Incorporating Functional Inter-Relationships into Protein Function Prediction Algorithms," BMC Bioinformatics, vol. 10, no. 1,article 142, 2009.
[37] P. Hu, H. Jiang, and A. Emili, "Predicting Protein Functions by Relaxation Labelling Protein Interaction Network," BMC Bioinformatics, vol. 11, no. Suppl 1, article S64, 2010.
[38] J.Z. Wang, Z. Du1, R. Payattakool, P.S. Yu, and C.F. Chen, "A New Method to Measure the Semantic Similarity of GO Terms," Bioinformatics, vol. 23, no. 10, pp. 1274-1281, 2007.
[39] J. McDermott, R. Bumgarner, and R. Samudrala, "Functional Annotation from Predicted Protein Interaction Networks," Bioinformatics, vol. 21, no. 15, pp. 3217-3226, 2005.
[40] "Amigo Visualization," amigo?mode=visualize&session_id=, 2012.
[41] B. Done, P. Khatri, A. Done, and S. Drăghici, "Predicting Novel Human Gene Ontology Annotations Using Semantic Analysis," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 13, no. 7, pp. 91-99, Jan.-Mar. 2010.
[42] K. Tsuda, H. Shin, and B. Schölkopf, "Fast Protein Classification with Multiple Networks," Bioinformatics, vol. 21, no. suppl 2, pp. ii59-ii65, 2005.
[43] H. Shin, K. Tsuda, and B. Schölkopf, "Protein Functional Class Prediction with a Combined Graph," Expert Systems with Applications, vol. 36, no. 2, pp. 3284-3292, 2009.
[44] L. Peña-Castillo, M. Tasan, C.L. Myers, H. Lee, T. Joshi, C. Zhang, Y. Guan, M. Leone, A. Pagnani, W.K. Kim, C. Krumpelman, W. Tian, G. Obozinski, Y. Qi, S. Mostafavi, G.N. Lin, G.F. Berriz, F.D. Gibbons, G. Lanckriet, J. Qiu, C. Grant, Z. Barutcuoglu, D.P. Hill, D. Warde-Farley, C. Grouios, D. Ray, J.A. Blake, M. Deng, M.I. Jordan, W.S. Noble, Q. Morris, J. Klein-Seetharaman, Z. Bar-Joseph, T. Chen, F. Sun, O.G. Troyanskaya, E.M. Marcotte, D. Xu, T.R. Hughes, and F.P. Roth, "A Critical Assessment of mus Musculus Gene Function Prediction Using Integrated Genomic Evidence," Genome Biology, vol. 9, no. Suppl 1, p. S2, 2008.
[45] S. Mostafavi and Q. Morris, "Fast Integration of Heterogeneous Data Sources for Predicting Gene Function with Limited Annotation," Bioinformatics, vol. 26, no. 14, pp. 1759-1765, 2010.
[46] S.R. Collins, P. Kemmeren, X.C. Zhao, J.F. Greenblatth, F. Spencerg, F.C.P. Holstegee, J.S. Weissman, and N.J. Krogan, "Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces Cerevisiae," Molecular and Cellular Proteomics, vol. 6, no. 3, pp. 439-450, 2007.
[47] C. Stark, B.J. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, and M. Tyers, "BioGRID: A General Repository for Interaction Data Sets," Nucleic Acids Research, vol. 34, no. suppl 1, pp. D535-D539, 2006.
[48] M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, and G. Sherlock, "Gene Ontology: Tool for the Unification of Biology," Nature Genetics, vol. 25, no. 1, pp. 25-29, 2000.
[49] S.Y. Rhee, V. Wood, K. Dolinski, and S. Draghici, "Use and Misuse of the Gene Ontology Annotations," Nature Rev. Genetics, vol. 9, no. 7, pp. 509-515, 2008.
[50] C.L. Myers email, D.R. Barrett, M.A. Hibbs, C. Huttenhower, and O.G. Troyanskaya, "Finding Function: Evaluation Methods for Functional Genomic Data," BMC Genomics, vol. 7, no. 1,article 187, 2006.
[51] L. Schietgat, C. Vens, J. Struyf, H. Blockeel, D. Kocev, and S. Džeroski, "Predicting Gene Function Using Hierarchical Multi-Label Decision Tree Ensembles," BMC Bioinformatics, vol. 11, no. 1,article 2, 2010.
[52] H. Chua and L. Wong, "Increasing the Reliability of Protein Interactomes," Drug Discovery Today, vol. 13, nos. 15/16, pp. 652-658, 2008.
[53] "G-sesame," index.php, 2012.
[54] A.J. Smola and B. Schölkopf, "A Tutorial on Support Vector Regression," Statistics and Computing, vol. 14, no. 3, pp. 199-222, 2004.

Index Terms:
proteins,cellular biophysics,genomics,microorganisms,molecular biophysics,S.cerevisiae,protein function prediction algorithms,functional interrelationships,functional annotation,post-genomic era,traditional algorithms,functional similarity measurement,Jaccard coefficient,Homo sapiens data sets,cerevisiae data sets,Proteins,Prediction algorithms,Bioinformatics,Computational biology,Training data,Training,RNA,Gaussian random fields model.,Protein function prediction,Gene Ontology,semantic similarity measure,protein-protein interaction
Xiao-Fei Zhang, Dao-Qing Dai, "A Framework for Incorporating Functional Interrelationships into Protein Function Prediction Algorithms," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 3, pp. 740-753, May-June 2012, doi:10.1109/TCBB.2011.148
Usage of this product signifies your acceptance of the Terms of Use.