The Community for Technology Leaders
RSS Icon
Issue No.01 - Jan.-Feb. (2013 vol.10)
pp: 73-86
Peter Boyen , Hasselt Univ. & Transnat., Univ. of Limburg, Diepenbeek, Belgium
Frank Neven , Hasselt Univ. & Transnat., Univ. of Limburg, Diepenbeek, Belgium
Dries van Dyck , Adv. Nucl. Syst., Nucl. Syst. Res., Belgian Nucl. Res. Centre (SCK-CEN), Mol, Belgium
Felipe L. Valentim , Appl. Bioinf., Plant Res. Int., Wageningen, Netherlands
Aalt D. J. van Dijk , Appl. Bioinf., Plant Res. Int., Wageningen, Netherlands
Correlated motif covering (CMC) is the problem of finding a set of motif pairs, i.e., pairs of patterns, in the sequences of proteins from a protein-protein interaction network (PPI-network) that describe the interactions in the network as concisely as possible. In other words, a perfect solution for CMC would be a minimal set of motif pairs that describes the interaction behavior perfectly in the sense that two proteins from the network interact if and only if their sequences match a motif pair in the minimal set. In this paper, we introduce and formally define CMC and show that it is closely related to the red-blue set cover (RBSC) problem and its weighted version (WRBSC)-both well-known NP-hard problems for that there exist several algorithms with known approximation factor guarantees. We prove the hardness of approximation of CMC by providing an approximation factor preserving reduction from RBSC to CMC. We show the existence of a theoretical approximation algorithm for CMC by providing an approximation factor preserving reduction from CMC to WRBSC. We adapt the latter algorithm into a functional heuristic for CMC, called CMC-approx, and experimentally assess its performance and biological relevance. The implementation in Java can be found at http://
Proteins, Approximation methods, Approximation algorithms, Bioinformatics, Silicon, IEEE transactions,local search, Graphs and networks, biology and genetics, correlated motifs, PPI networks
Peter Boyen, Frank Neven, Dries van Dyck, Felipe L. Valentim, Aalt D. J. van Dijk, "Mining Minimal Motif Pair Sets Maximally Covering Interactions in a Protein-Protein Interaction Network", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 1, pp. 73-86, Jan.-Feb. 2013, doi:10.1109/TCBB.2012.165
[1] M.P.H. Stumpf, T. Thorne, E. de Silva, R. Stewart, H.J. An, M. Lappe, and C. Wiuf, “Estimating the Size of the Human Interactome,” Proc. Nat'l Academy of Sciences USA, vol. 105, no. 19, pp. 6959-6964, 2008.
[2] P. Boyen, D. Van Dyck, F. Neven, R.C. van Ham, and A.D.J. van Dijk, “SLIDER: A Generic Metaheuristic for the Discovery of Correlated Motifs in Protein-Protein Interaction Networks,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 5, pp. 1344-1357, Sept./Oct. 2011.
[3] S.H. Tan, W. Hugo, W.K. Sung, and S.K. Ng, “A Correlated Motif Approach for Finding Short Linear Motifs from Protein Interaction Networks,” BMC Bioinformatics, vol. 7, article 502, 2006.
[4] H.C. Leung, M.H. Siu, S.M. Yiu, F.Y. Chin, and K.W. Sung, “Finding Linear Motif Pairs from Protein Interaction Networks: A Probabilistic Approach,” Computational Systems Bioinformatics Conf., vol. 6, pp. 111-119, 2007.
[5] H. Li, J. Li, and L. Wong, “Discovering Motif Pairs at Interaction Sites from Protein Sequences on a Proteome-Wide Scale,” Bioinformatics, vol. 22, pp. 989-996, 2006.
[6] J. Li, G. Liu, H. Li, and L. Wong, “Maximal Biclique Subgraphs and Closed Pattern Pairs of the Adjacency Matrix: A One-to-One Correspondence and Mining Algorithms,” IEEE Trans. Knowledge and Data Eng., vol. 19, no. 12, pp. 1625-1637, Dec. 2007.
[7] J. Li, K. Sim, G. Liu, and L. Wong, “Maximal Quasi-Bicliques with Balanced Noise Tolerance: Concepts and Co-Clustering Applications,” Proc. SIAM Int'l Conf. Data Mining, pp. 72-83, 2008.
[8] D. Peleg, “Approximation Algorithms for the Label-CoverMAX and Red-Blue Set Cover Problems,” J. Discrete Algorithms, vol. 5, pp. 55-64, 2007.
[9] J. Rissanen, “A Universal Prior for Integers and Estimation by Minimum Description Length,” Ann. of Statistics, vol. 11, pp. 416-431, 1983.
[10] R.D. Carr, S. Doddi, G. Konjevod, and M. Marathe, “On the Red-Blue Set Cover Problem,” Proc. ACM-SIAM Symp. Discrete Algorithms (SODA), pp. 345-353, 2000.
[11] V.V. Vazirani, Approximation Algorithms. Springer, 2004.
[12] I. Dinur and S. Safra, “On the Hardness of Approximating Label-Cover,” Information Processing Letters, vol. 89, pp. 247-254, 2004.
[13] M. Elkin and D. Peleg, “The Hardness of Approximating Spanner Problems,” Theory of Computing Systems, vol. 41, pp. 691-729, 2007.
[14] V. Chvatal, “A Greedy Heuristic for the Set-Covering Problem,” Math. of Operations Research, vol. 4, no. 3, pp. 233-235, 1979.
[15] S.R.R. Collins, P. Kemmeren, Xue, J.F.F. Greenblatt, F. Spencer, F.C.C. Holstege, J.S.S. Weissman, and N.J.J. Krogan, “Towards a Comprehensive Atlas of the Physical Interactome of Saccharomyces Cerevisiae,” Molecular & Cellular Proteomics, vol. 6, pp. 439-450, 2007.
[16] J. Yu, M. Guo, C.J. Needham, Y. Huang, L. Cai, and D.R. Westhead, “Simple Sequence-Based Kernels Do Not Predict Protein-Protein Interactions,” Bioinformatics, vol. 26, no. 20, pp. 2610-2614, 2010.
[17] Y. Yabuki, Y. Mukai, M.B. Swindells, and M. Suwa, “Genius II: A High-Throughput Database System for Linking ORFs in Complete Genomes to Known Protein Three-Dimensional Structures,” Bioinformatics, vol. 20, pp. 596-598, 2004.
[18] R.C. Edgar, “MUSCLE: Multiple Sequence Alignment with High Accuracy and High Throughput,” Nucleic Acids Research, vol. 32, no. 5, pp. 1792-1797, 2004.
[19] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering Frequent Closed Itemsets for Association Rules,” Proc. Int'l Conf. Database Theory (ICDT), pp. 398-416, 1999.
[20] R.J. Bayardo, “Efficiently Mining Long Patterns from Databases,” Proc. Int'l Conf. Management of Data (SIGMOD), pp. 85-93, 1998.
[21] B. Bringmann and A. Zimmermann, “One in a Million: Picking the Right Patterns,” Knowledge and Information Systems, vol. 18, pp. 61-81, 2009.
[22] F. Geerts, B. Goethals, and T. Mielikäinen, “Tiling Databases,” Discovery Science, vol. 3245, pp. 278-289, 2004.
[23] J. Vreeken, M. Leeuwen, and A. Siebes, “Krimp: Mining Itemsets that Compress,” Data Mining and Knowledge Discovery, vol. 23, pp. 169-214, 2011.
[24] Y. Xiang, R. Jin, D. Fuhry, and F.F. Dragan, “Succinct Summarization of Transactional Databases: An Overlapped Hyperrectangle Scheme,” Proc. Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD), pp. 758-766, 2008.
[25] M. Das and H.K. Dai, “A Survey of DNA Motif Finding Algorithms,” BMC Bioinformatics, vol. 8, no. Suppl 7, article S21, 2007.
[26] J.R. Bradford and D.R. Westhead, “Improved Prediction of Protein-Protein Binding Sites Using a Support Vector Machines Approach,” Bioinformatics, vol. 21, no. 8, pp. 1487-1494, 2005.
[27] S. Liang, C. Zhang, S. Liu, and Y. Zhou, “Protein Binding Site Prediction Using an Empirical Scoring Function,” Nucleic Acids Research, vol. 34, no. 13, pp. 3698-3707, 2006.
[28] G. López, A. Valencia, and M.L. Tress, “Firestar---Prediction of Functionally Important Residues Using Structural Templates and Alignment Reliability,” Nucleic Acids Research, vol. 35, pp. 573-577, 2007.
[29] Y. Murakami and S. Jones, “Sharp2: Protein-Protein Interaction Predictions Using Patch Analysis,” Bioinformatics, vol. 22, pp. 1794-1795, 2006.
[30] S.S. Negi, C.H. Schein, N. Oezguen, T.D. Power, and W. Braun, “InterProSurf: A Web Server for Predicting Interacting Sites on Protein Surfaces,” Bioinformatics, vol. 23, no. 24, pp. 3397-3399, 2007.
[31] H. Neuvirth, R. Raz, and G. Schreiber, “ProMate: A Structure Based Prediction Program to Identify the Location of Protein- Protein Binding Sites,” J. Molecular Biology, vol. 338, no. 1, pp. 181-199, 2004.
[32] U. Ogmen, O. Keskin, A.S. Aytuna, R. Nussinov, and A. Gursoy, “PRISM: Protein Interactions by Structural Matching,” Nucleic Acids Research, vol. 33, pp. W331-W336, 2005.
[33] A. Porollo and J. Meller, “Prediction-Based Fingerprints of Protein-Protein Interactions,” Proteins: Structure, Function, and Bioinformatics, vol. 66, no. 3, pp. 630-645, 2007.
[34] A. Shulman-Peleg, R. Nussinov, and H.J. Wolfson, “Recognition of Functional Sites in Protein Structures,” J. Molecular Biology, vol. 339, no. 3, pp. 607-633, 2004.
[35] E.W. Stawiski, L.M. Gregoret, and Y.M. Gutfreund, “Annotating Nucleic Acid-Binding Function Based on Protein Structure,” J. Molecular Biology, vol. 326, no. 4, pp. 1065-1079, 2003.
[36] H. Tjong, S. Qin, and H. Zhou, “PI2PE: Protein Interface/Interior Prediction Engine,” Nucleic Acids Research, vol. 35, pp. W357-W362, 2007.
[37] H.X. Zhou and Y. Shan, “Prediction of Protein Interaction Sites from Sequence Profile and Residue Neighbor List,” Proteins, vol. 44, no. 3, pp. 336-343, 2001.
[38] H.-X. Zhou and S. Qin, “Interaction-Site Prediction for Protein Complexes: A Critical Assessment,” Bioinformatics, vol. 23, no. 17, pp. 2203-2209, 2007.
[39] A.D.J. van Dijk, G. Morabito, M. Fiers, R.C.H.J. van Ham, G.C. Angenent, and R.G.H. Immink, “Sequence Motifs in MADS Transcription Factors Responsible for Specificity and Diversification of Protein-Protein Interaction,” PLoS Computational Biology, vol. 6, no. 11, p. e1001017, 2010.
[40] F.L. Valentim, F. Neven, P. Boyen, and A.D.J. van Dijk, “Interactome-Wide Prediction of Protein-Protein Binding Sites Reveals Effects of Protein Sequence Variation in Arabidopsis Thaliana,” PLoS ONE, vol. 7, no. 10, p. e47022, 2012.
[41] Q.C. Zhang, D. Petrey, R. Norel, and B.H. Honig, “Protein Interface Conservation Across Structure Space,” Proc. Nat'l Academy of Sciences USA, vol. 107, no. 24, pp. 10896-10901, 2010.
[42] E.I. Severing, A.D.J. van Dijk, G. Morabito, J. Busscher-Lange, R.G.H. Immink, and R.C.H.J. van Ham, “Predicting the Impact of Alternative Splicing on Plant MADS Domain Protein Function,” PLoS ONE, vol. 7, no. 1, p. e30524, 2012.
[43] S. Kerrien et al. “The Intact Molecular Interaction Database in 2012,” Nucleic Acids Research, vol. 40, pp. D841-D846, 2012.
30 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool