2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2015)
Washington, DC, USA
Nov. 9, 2015 to Nov. 12, 2015
Antonio Sze-To , Systems Design Engineering, University of Waterloo, Canada
Sanderz Fung , Systems Design Engineering, University of Waterloo, Canada
En-Shiun Annie Lee , Systems Design Engineering, University of Waterloo, Canada
Andrew K. C. Wong , Systems Design Engineering, University of Waterloo, Canada
Understanding Protein-protein interaction (PPI) is of fundamental importance in deciphering cellular processes. Predicting PPIs is thus critical in making new discoveries in the biological domains. Traditionally, new PPIs are identified through biochemical experiments but such methods are labor-intensive, expensive, time-consuming and technically ineffective due to high false positive rates. Computational docking is an alternative but requires the three-dimensional structures of the target proteins which are not always accessible. Sequence-based prediction is the most readily applicable and cost-effective method. It exploits known PPI Databases to construct classifiers for predicting unknown PPIs based only on sequence data. However, existing methods, adopting features that fix the pattern length and use exact patterns, are biologically unrealistic. Also, those based on SVM and String Kernel are hardly biologically interpretable since they do not compute the features. Recently, we have developed a new method for predicting PPI known as WeMine-P2P based on our WeMine Aligned Pattern Clustering algorithm which discovers and identifies the localized and co-occurring conserved patterns and regions allowing variable length and pattern variations. As our first attempt, under 40 independent experiments, we showed that (1) WeMine-P2P outperforms the well-known algorithm, PIPE2 which also utilizes co-occurring amino acid sequence segments but does not allow variable lengths and pattern variations; (2) Unlike SVM-based methods, WeMine-P2P renders interpretable biological features; (3) WeMine-P2P achieves satisfactory PPI prediction performance, comparable to the SVM-based methods particularly in unseen protein sequences, with a potential reduction of feature dimension of 1280x. WeMine-P2P is extendable to other biosequence interactions such as predicting Protein-DNA interactions.
Random Forest, Protein-Protein Interaction, Co-Occurrence Aligned Pattern Cluster, Supervised Learning
A. Sze-To, S. Fung, E. A. Lee and A. K. Wong, "Predicting Protein-protein interaction using co-occurring Aligned Pattern Clusters," 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Washington, DC, USA, 2015, pp. 55-60.