This Article 
 Bibliographic References 
 Add to: 
Molecular Function Prediction Using Neighborhood Features
April-June 2010 (vol. 7 no. 2)
pp. 208-217
Petko Bogdanov, University of California at Santa Barbara, Santa Barbara
Ambuj K. Singh, University of California at Santa Barbara, Santa Barbara
The recent advent of high-throughput methods has generated large amounts of gene interaction data. This has allowed the construction of genomewide networks. A significant number of genes in such networks remain uncharacterized and predicting the molecular function of these genes remains a major challenge. A number of existing techniques assume that genes with similar functions are topologically close in the network. Our hypothesis is that genes with similar functions observe similar annotation patterns in their neighborhood, regardless of the distance between them in the interaction network. We thus predict molecular functions of uncharacterized genes by comparing their functional neighborhoods to genes of known function. We propose a two-phase approach. First, we extract functional neighborhood features of a gene using Random Walks with Restarts. We then employ a KNN classifier to predict the function of uncharacterized genes based on the computed neighborhood features. We perform leave-one-out validation experiments on two S. cerevisiae interaction networks and show significant improvements over previous techniques. Our technique provides a natural control of the trade-off between accuracy and coverage of prediction. We further propose and evaluate prediction in sparse genomes by exploiting features from well-annotated genomes.

[1] Yeast Genome Database, http:/, 2009.
[2] Ensembl—On-Line Genome Database, http:/www.ensembl. org/, 2009.
[3] "Gene Ontology: Tool for the Unification of Biology," Nature Genetics, vol. 25, pp. 25-29, 2000.
[4] BioGRID: General Repository for Interaction Datasets, http:/, 2006.
[5] V. Arnau, S. Mars, and I. Marin, "Iterative Clustering Analysis of Protein Interaction Data," Bioinformatics, vol. 21, pp. 364-378, 2005.
[6] C. Brun, F. Chevenet, D. Martin, J. Wojcik, A. Guenoche, and B. Jacq, "Functional Classification of Proteins for the Prediction of Cellular Function from a Protein-Protein Interaction Network," Genome Biology, vol. 5, no. 1,R6, PMC395738/, 2003.
[7] T. Can, O. Camoglu, and A.K. Singh, "Analysis of Protein Interaction Networks Using Random Walks," Proc. Fifth ACM SIGKDD Workshop Data Mining in Bioinformatics, 2005.
[8] J. Chen, W. Hsu, M.L. Lee, and S.-K. Ng, "Labeling Network Motifs in Protein Interactomes for Protein Function Prediction," Proc. Int'l Conf. Data Eng. (ICDE), 2007.
[9] H. Chua, W. Sung, and L. Wong, "Exploiting Indirect Neighbors and Topological Weight to Predict Protein Function from Protein-Protein Interactions," Bioinformatics, vol. 19, pp. i197-i204, 2006.
[10] H. Chua, W. Sung, and L. Wong, "Using Indirect Protein Interactions for the Prediction of Gene Ontology Functions," BMC Bioinformatics, vol. 8, suppl. 4, p. S8, http://www.biomed 8/S4S8, 2007.
[11] M. Deng, Z. Tu, F. Sun, and T. Chen, "Mapping Gene Ontology to Proteins Based on Protein-Protein Interaction Data," Bioinformatics, vol. 20, pp. 895-902, Apr. 2004.
[12] M. Deng, K. Zhang, S. Mehta, T. Chen, and F. Sun, "Prediction of Protein Function Using Protein-Protein Interaction Data," J. Computational Biology, vol. 10, pp. 947-960, 2003.
[13] R. Dunn, F. Dudbridge, and C. Sanderson, "The Use of Edge-Betweenness Clustering to Investigate the Biological Function in Protein Interaction Networks," BMC Bioinformatics, vol. 6, article 1, pp. 39-53, 39, 2005.
[14] J. Han et al., "Evidence for Dynamically Organized Modularity in the Yeast Protein-Protein Interaction Network," Nature, vol. 430, pp. 88-93, 2004.
[15] T. Hawkins, S. Luban, and D. Kihara, "Enhanced Automated Function Prediction Using Distantly Related Sequences and Contextual Association by PFP," Protein Science, vol. 15, pp. 1550-1556, June 2006.
[16] H. Hishigaki, K. Nakai, T. Ono, A. Tanigami, and T. Takagi, "Assessment of Prediction Accuracy of Protein Function from Protein-Protein Interaction Data," Yeast, vol. 18, pp. 523-531, 2001.
[17] U. Karaoz, T.M. Murali, S. Letovsky, Y. Zheng, C. Ding, C.R. Cantor, and S. Kasif, "Whole-Genome Annotation by Using Evidence Integration in Functional-Linkage Networks," Proc. Nat'l Academy of Sciences USA, vol. 101, pp. 2888-2893, 2004.
[18] S. Kim, J. Lund, M. Kiraly, K. Duke, M. Jiang, J. Stuart, A. Eizinger, B. Wylie, and G. Davidson, "A Gene Expression Map for Caenorhabditis Elegans," Science, vol. 293, pp. 2087-2092, Sept. 2001.
[19] O.D. King, R.E. Foulger, S.S. Dwight, J.V. White, and F.P. Roth, "Predicting Gene Function from Patterns of Annotation," Genome Research, vol. 13, pp. 896-904, May 2003.
[20] M. Kirac and G. Ozsoyoglu, "Protein Function Prediction Based on Patterns in Biological Networks," Proc. Ann. Int'l Conf. Research in Computational Molecular Biology, pp. 197-213, 2008.
[21] S. Kohler, S. Bauer, D. Horn, and P.N. Robinson, "Walking the Interactome for Prioritization of Candidate Disease Genes," Am. J. Human Genetics, vol. 82, pp. 949-958, Apr. 2008.
[22] S. Letovsky and S. Kasif, "Predicting Protein Function from Protein/Protein Interaction Data: A Probabilistic Approach," Bioinformatics, vol. 19, i197-i204, 2003.
[23] K. Maciag, S. Altschuler, M. Slack, N. Krogan, A. Emili, J. Greenblatt, T. Maniatis, and L. Wu, "Systems-Level Analysis Identify Extensive Coupling Among Gene Expression Machines," Molecular Systems Biology, vol. 2, fullmsb4100045.html, 2006.
[24] C.V. Mering, M. Huynen, D. Jaeggi, S. Schmidt, P. Bork, and B. Snel, "String: A Database of Predicted Functional Associations between Proteins," Nucleic Acids Research, vol. 31, pp. 258-261, 2003.
[25] H.W. Mewes, D. Frishman, U. Guldener, G. Mannhaupt, K. Mayer, M. Mokrejs, B. Morgenstern, M. Munsterkotter, S. Rudd, and B. Weil, "Mips: A Database for Genomes and Protein Sequences," Nucleic Acids Research, vol. 30, pp. 31-34, 2002.
[26] E. Nabieva, K. Jim, A. Agarwal, B. Chazelle, and M. Singh, "Whole-Proteome Prediction of Protein Function via Graph-Theoretic Analysis of Interaction Maps," Bioinformatics, vol. 21, i302-i310, 2005.
[27] K. O'Brien, M. Remm, and E. Sonnhammer, "Inparanoid: A Comprehensive Database of Eukaryotic Orthologs," Nucleic Acids Research, vol. 33, D476-D480, Jan. 2005.
[28] M.P. Samanta and S. Liang, "Predicting Protein Functions from Redundancies in Large-Scale Protein Interaction Networks," Proc. Nat'l Academy of Sciences USA, vol. 100, pp. 12579-12583, 2003.
[29] B. Schwikowski, P. Uetz, and S. Fields, "A Network of Protein-Protein Interactions in Yeast," Nature Biotechnology, vol. 18, pp. 1257-1261, 2000.
[30] P. Sen, G.M. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. Eliassi-Rad, "Collective Classification in Network Data," Technical Report CS-TR-4905, Univ. of Maryland, College Park, 2008.
[31] R. Sharan, I. Ulitsky, and R. Shamir, "Network-Based Prediction of Protein Function," Molecular Systems Biology, vol. 3, pp. 1-13, 2007.
[32] V. Spirin and L. Mirny, "Protein Complexes and Functional Modules in Molecular Networks," Proc. Nat'l Academy of Sciences USA, pp. 12123-12128, 2003.
[33] O. Vanunu and R. Sharan, "A Propagation-Based Algorithm for Inferring Gene-Disease Associations," Proc. German Conf. Bioinformatics, 2008.
[34] Y. Wu and S. Lonardi, "A Linear-Time Algorithm for Predicting Functional Annotations from PPI Networks," J. Bioinformatics and Computational Biology, vol. 6, pp. 1049-1065, Dec. 2008.
[35] G.X. Yu, E.M. Glass, N.T. Karonis, and N. Maltsev, "Knowledge-Based Voting Algorithm for Automated Protein Functional Annotation," Proteins: Structure, Function, and Bioinformatics, vol. 61, pp. 907-917, 2005.

Index Terms:
Gene function prediction, feature extraction, classification, functional interaction network.
Petko Bogdanov, Ambuj K. Singh, "Molecular Function Prediction Using Neighborhood Features," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. 2, pp. 208-217, April-June 2010, doi:10.1109/TCBB.2009.81
Usage of this product signifies your acceptance of the Terms of Use.