The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - September/October (2011 vol.8)
pp: 1344-1357
Peter Boyen , Hasselt University, Diepenbeek and Transnational University of Limburg
Dries Van Dyck , Hasselt University, Diepenbeek and Transnational University of Limburg
Frank Neven , Hasselt University, Diepenbeek and Transnational University of Limburg
Roeland C.H.J. van Ham , Applied Bioinformatics - Plant Research International, Wageningen
Aalt D.J. van Dijk , Applied Bioinformatics - Plant Research International, Wageningen
ABSTRACT
Correlated motif mining (cmm) is the problem of finding overrepresented pairs of patterns, called motifs, in sequences of interacting proteins. Algorithmic solutions for cmm thereby provide a computational method for predicting binding sites for protein interaction. In this paper, we adopt a motif-driven approach where the support of candidate motif pairs is evaluated in the network. We experimentally establish the superiority of the Chi-square-based support measure over other support measures. Furthermore, we obtain that cmm is an np-hard problem for a large class of support measures (including Chi-square) and reformulate the search for correlated motifs as a combinatorial optimization problem. We then present the generic metaheuristic slider which uses steepest ascent with a neighborhood function based on sliding motifs and employs the Chi-square-based support measure. We show that slider outperforms existing motif-driven cmm methods and scales to large protein-protein interaction networks. The slider-implementation and the data used in the experiments are available on http://bioinformatics.uhasselt.be.
INDEX TERMS
Graphs and networks, biology and genetics.
CITATION
Peter Boyen, Dries Van Dyck, Frank Neven, Roeland C.H.J. van Ham, Aalt D.J. van Dijk, "SLIDER: A Generic Metaheuristic for the Discovery of Correlated Motifs in Protein-Protein Interaction Networks", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 5, pp. 1344-1357, September/October 2011, doi:10.1109/TCBB.2011.17
REFERENCES
[1] Local Search in Combinatorial Optimization, E. Aarts and J. Lenstra, eds. John Wiley and Sons, 1997.
[2] P. Aloy and R. Russell, “Ten Thousand Interactions for the Molecular Biologist,” Nature Biotechnology, vol. 22, pp. 1317-1321, 2004.
[3] H. Berman et al., “The Protein Data Bank,” Nucleic Acids Research, vol. 28, pp. 235-242, 2000.
[4] C. Blum and A. Roli, “Metaheuristics in Combinatorial Optimization: Overview and Conceptual Comparison,” ACM Computing Surveys, vol. 35, no. 3, pp. 268-308, 2003.
[5] P. Boyen, F. Neven, D. Van Dyck, A. van Dijk, and R. van Ham, “SLIDER: Mining Correlated Motifs in Protein-Protein Interaction Networks,” Proc. Ninth IEEE Int'l Conf. Data Mining (ICDM '09), pp. 716-721, Dec. 2009, doi:10.1109/ICDM.2009.92.
[6] S. Collins et al., “Towards a Comprehensive Atlas of the Physical Interactome of Saccharomyces Cerevisiae,” Molecular and Cellular Proteomics, vol. 6, pp. 439-450, 2007.
[7] M. Garey and D. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, 1979.
[8] K. Gouda, M. Hassaan, and M. Zaki, “Prism: A Primal-Encoding Approach for Frequent Sequence Mining,” Proc. IEEE Int'l Conf. Data Mining (ICDM '07), pp. 487-492, 2007.
[9] S. Hubbard and J. Thornton, “NACCESS” Computer Program, Dept. of Biochemistry and Molecular Biology, Univ. College London, 1993.
[10] T. Kawabata et al., “GTOP: A Database of Protein Structures Predicted from Genome Sequences,” Nucleic Acids Research, vol. 30, pp. 294-298, 2002.
[11] H. Leung, M. Siu, S. Yiu, F. Chin, and K. Sung, “Finding Linear Motif Pairs from Protein Interaction Networks: A Probabilistic Approach,” Proc. Conf. Computational Systems Bioinformatics (CSB '06), pp. 111-120, 2006.
[12] H. Li, J. Li, and L. Wong, “Discovering Motif Pairs at Interaction Sites from Protein Sequences on a Proteome-Wide Scale,” Bioinformatics, vol. 22, no. 8, pp. 989-996, 2006.
[13] J. Li, G. Liu, H. Li, and L. Wong, “Maximal Biclique Subgraphs and Closed Pattern Pairs of the Adjacency Matrix: A One-to-One Correspondence and Mining Algorithms,” IEEE Trans. Knowledge and Data Eng., vol. 19, no. 12, pp. 1625-1637, Dec. 2007.
[14] J. Li, K. Sim, G. Liu, and L. Wong, “Maximal Quasi-Bicliques with Balanced Noise Tolerance: Concepts and Co-Clustering Applications,” Proc. SIAM Int'l Conf. Data Mining (SDM '08), pp. 72-83, 2008.
[15] T. Prasad et al., “Human Protein Reference Database—2009 Update,” Nucleic Acids Research, vol. 37, pp. D767-D772, 2009.
[16] M. Stumpf, T. Thorne, E. de Silva, R. Stewart, H. An, M. Lappe, and C. Wiuf, “Estimating the Size of the Human Interactome,” Proc. Nat'l Academy of Sciences USA, vol. 105, no. 19, pp. 6959-6964, 2008.
[17] S. Tan, W. Hugo, W. Sung, and S. Ng, “A Correlated Motif Approach for Finding Short Linear Motifs from Protein Interaction Networks,” BMC Bioinformatics, vol. 7, article no. 502, Nov. 2006.
[18] A. van Dijk, C. ter Braak, R. Immink, G. Angenent, and R. van Ham, “Predicting and Understanding Transcription Factor Interactions Based on Sequence Level Determinants of Combinatorial Control,” Bioinformatics, vol. 24, no. 1, pp. 26-33, 2008.
[19] C. von Mering, R. Krause, B. Snel, M. Cornell, S. Oliver, S. Fields, and P. Bork, “Comparative Assessment of Large-Scale Data Sets of Protein-Protein Interactions,” Nature, vol. 417, pp. 399-403, 2002.
[20] M. Šikić, S. Tomić, and K. Vlahoviček, “Prediction of Protein-Protein Interaction Sites in Sequences and 3d Structures by Random Forests,” PLoS Computational Biology, vol. 5, no. 1, p. e1000278+, 2009.
514 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool