This Article 
 Bibliographic References 
 Add to: 
Essential Latent Knowledge for Protein-Protein Interactions: Analysis by an Unsupervised Learning Approach
April-June 2005 (vol. 2 no. 2)
pp. 119-130

Abstract—Protein-protein interactions play a number of central roles in many cellular functions, including DNA replication, transcription and translation, signal transduction, and metabolic pathways. A recent increase in the number of protein-protein interactions has made predicting unknown protein-protein interactions important for the understanding of living cells. However, the protein-protein interactions experimentally obtained so far are often incomplete and contradictory and, consequently, existing computational prediction methods have integrated evidence (latent knowledge of proteins) from different and more reliable sources. Analyzing the relationships between proteins and the latent knowledge is important to understanding the cellular processes. For this analysis, we propose a new probabilistic model for protein-protein interactions by considering the latent knowledge of proteins. We further present an efficient learning algorithm for this model, based on an EM algorithm. Experimental results have shown that in a supervised test setting, the proposed method outperformed five other competing methods by a statistically significant factor in all cases. Using the probability parameters of a trained model, we have further shown the latent knowledge that is essential to predicting protein-protein interactions. Overall, our experimental results confirm that our proposed model is especially effective for analyzing protein-protein interactions from a viewpoint of the latent knowledge of proteins.

[1] G. Bader and C. Hogue, “Analyzing Yeast Protein-Protein Interaction Data Obtained from Different Sources,” Nature Biotechnology, vol. 20, pp. 991-997, 2002.
[2] J. Bader, A. Chaudhuri, J. Rothberg, and J. Chant, “Gaining Confidence in High-Throughput Protein Interaction Networks,” Nature Biotechnology, vol. 22, pp. 78-85, 2004.
[3] K. Barnard, P. Duygulu, D. Forsyth, N. Freitas, D. Blei, and M. Jordan, “Matching Words and Pictures,” J. Machine Learning Research, vol. 3, pp. 1107-1135, 2003.
[4] C. Bishop and M. Tipping, “A Hierarchical Latent Variable Model for Data Visualization,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 281-293, Mar. 1998.
[5] J. Bock and D. Gough, “Predicting Protein-Protein Interactions from Primary Structure,” Bioinformatics, vol. 17, no. 5, pp. 455-460, 2001.
[6] A. Dempster, N. Laird, and D. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc.: Series B, vol. 39, pp. 1-38, 1977.
[7] A. Gavin et al., “Functional Organization of the Yeast Proteome by Systematic Analysis of Protein Complexes,” Nature, vol. 415, pp. 141-147, 2002.
[8] S. Gomez, W. Noble, and A. Rzhetsky, “Learning to Predict Protein-Protein Interactions from Protein Sequences,” Bioinformatics, vol. 19, pp. 1875-1881, 2003.
[9] A. Grigoriev, “On the Number of Protein-Protein Interactions in the Yeast Proteome,” Nucleic Acids Research, vol. 31, pp. 4157-4161, 2003.
[10] Y. Ho et al., “Systematic Identification of Protein Complexes in Saccharomyces Cerevisiae by Mass Spectrometry,” Nature, vol. 415, pp. 180-183, 2002.
[11] T. Hofmann, “Learning and Representing Topic. A Hierarchical Mixture Model for Word Occurrence in Document Databases,” Proc. Conf. Automated Learning and Discovery (CONALD), 1998.
[12] T. Hofmann, “Unsupervised Learning by Probabilistic Latent Semantic Analysis,” Machine Learning, vol. 42, pp. 177-196, 2001.
[13] T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki, “A Comprehensive Two-Hybrid Analysis to Explore the Yeast Protein Interactome,” Proc. Nat'l Academy of Sciences, vol. 98, no. 8, pp. 4569-4574, 2001.
[14] T. Ito, K. Tashiro, S. Muta, R. Ozawa, T. Chiba, M. Nishizawa, K. Yamamoto, S. Kuhara, and Y. Sakaki, “Toward a Protein-Protein Interaction Map of the Budding Yeast: A Comprehensive System to Examine Two-Hybrid Interactions in All Possible Combinations between the Yeast Proteins,” Proc. Nat'l Academy of Sciences, vol. 97, no. 3, pp. 1143-1147, 2000.
[15] R. Jansen, H. Yu, D. Greenbaum, Y. Kluger, N. Krogan, S. Chung, A. Emili, M. Snyder, J. Greenblatt, and M. Gerstein, “A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data,” Science, vol. 302, pp. 449-453, 2003.
[16] F. Sun, M. Deng, S. Mehta, and T. Chen, “Inferring Domain-Domain Interactions from Protein-Protein Interactions,” Genome Research, vol. 12, pp. 1540-1548, 2002.
[17] H. Mamitsuka, “Hierarchical Latent Knowledge Analysis for Co-Occurrence Data,” Proc. 20th Int'l Conf. Machine Learning, pp. 504-511, 2003.
[18] H. Mewes, C. Amid, R. Arnold, D. Frishman, U. Guldener, G. Mannhaupt, M. Munsterkotter, P. Pagel, N. Strack, V. Stumpflen, J. Warfsmann, and A. Ruepp, “MIPS: Analysis and Annotation of Proteins from Whole Genomes,” Nucleic Acids Research, vol. 32, pp. D41-D44, 2004.
[19] R. Mrowka, A. Patzak, and H. Herzel, “Is There a Bias in Proteome Research?” Genome Research, vol. 11, pp. 1971-1973, 2001.
[20] N. Mulder et al., “The InterPro Database, 2003 Brings Increased Coverage and New Features,” Nucleic Acids Research, vol. 31, pp. 315-318, 2003.
[21] F. Pereira, N. Tishby, and L. Lee, “Distributional Clustering of English Words,” Proc. 30th Ann. Meeting of the Assoc. for Computational Linguistics, pp. 183-190, 1993.
[22] B. Schölkopf et al., “Estimating the Support of a High-Dimensional Distribution,” Neural Computation, vol. 13, pp. 1443-1471, 2001.
[23] E. Sprinzak and H. Margalit, “Correlated Sequence-Signatures as Markers of Protein-Protein Interactions,” J. Molecular Biology, vol. 311, pp. 681-692, 2001.
[24] E. Sprinzak, S. Sattath, and H. Margalit, “How Reliable Are Experimental Protein-Protein Interaction Data?” J. Molecular Biology, vol. 327, pp. 919-923, 2003.
[25] P. Uetz et al., “A Comprehensive Analysis of Protein-Protein Interactions in Saccharomyces Cerevisiae,” Nature, vol. 403, pp. 623-631, 2000.
[26] P. Uetz and C. Vollert, “Protein-Protein Interactions,” Encyclopedic Reference of Genomics and Proteomics in Molecular Medicine, Springer-Verlag, 2004.
[27] V. Vapnik, The Nature of Statistical Learning Theory. Springer, 1995.
[28] C. von Mering, R. Krause, B. Snel, M. Cornell, S.G. Oliver, S. Fields, and P. Bork, “Comparative Assessment of Large-Scale Datasets of Protein-Protein Interactions,” Nature, vol. 417, pp. 399-403, 2002.

Index Terms:
Biology and genetics, machine learning, data mining, mining methods and algorithms.
Hiroshi Mamitsuka, "Essential Latent Knowledge for Protein-Protein Interactions: Analysis by an Unsupervised Learning Approach," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 2, no. 2, pp. 119-130, April-June 2005, doi:10.1109/TCBB.2005.23
Usage of this product signifies your acceptance of the Terms of Use.