14th International Workshop on Database and Expert Systems Applications (DEXA'03)
Efficient Mining from Heterogeneous Data Sets for Predicting Protein-Protein Interactions
Prague, Czech Republic
September 01-September 05
ISBN: 0-7695-1993-8
One of the most important issues in current molecular biology is to build exact networks of protein-protein interactions from currently available biological knowledge and information. We describe and demonstrate the effectiveness of a method for the issue of predicting protein-protein interactions, using a stochastic model as model for combining the data of protein-protein interactions with existing knowledge of proteins. In this paper, we consider a classification of proteins as the knowledge, and in a normally available classification of proteins, a protein falls into multiple classes. Focusing on this property of protein classes, we use the class of proteins as a latent variable in the stochastic model and estimate the model parameters with both the interaction data and protein classes using time-efficient EM (Expectation-Maximization) algorithm. We evaluate the method with the experiment using actual protein-protein interactions and a classification of proteins, and experimental results have shown that the method significantly outperformed other methods tested in our experiments.