The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - January/February (2012 vol.9)
pp: 203-213
A. Passerini , DISI Dipt. di Ing. e Scienza dell'Inf., Univ. degli Studi di Trento, Trento, Italy
M. Lippi , DII Dipt. di Ing. dell'Inf., Univ. degli Studi di Siena, Siena, Italy
P. Frasconi , DSI Dipt. di Sist. e Inf., Univ. degli Studi di Firenze, Firenze, Italy
ABSTRACT
Prediction of binding sites from sequence can significantly help toward determining the function of uncharacterized proteins on a genomic scale. The task is highly challenging due to the enormous amount of alternative candidate configurations. Previous research has only considered this prediction problem starting from 3D information. When starting from sequence alone, only methods that predict the bonding state of selected residues are available. The sole exception consists of pattern-based approaches, which rely on very specific motifs and cannot be applied to discover truly novel sites. We develop new algorithmic ideas based on structured-output learning for determining transition-metal-binding sites coordinated by cysteines and histidines. The inference step (retrieving the best scoring output) is intractable for general output types (i.e., general graphs). However, under the assumption that no residue can coordinate more than one metal ion, we prove that metal binding has the algebraic structure of a matroid, allowing us to employ a very efficient greedy algorithm. We test our predictor in a highly stringent setting where the training set consists of protein chains belonging to SCOP folds different from the ones used for accuracy estimation. In this setting, our predictor achieves 56 percent precision and 60 percent recall in the identification of ligand-ion bonds.
INDEX TERMS
Metals, Proteins, Ions, Bonding, Greedy algorithms, Bioinformatics, Three dimensional displays,greedy algorithms., Metal-binding prediction, machine learning, structured-output learning
CITATION
A. Passerini, M. Lippi, P. Frasconi, "Predicting Metal-Binding Sites from Protein Sequence", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 1, pp. 203-213, January/February 2012, doi:10.1109/TCBB.2011.94
REFERENCES
[1] K. Degtyarenko, “Bioinorganic Motifs: Towards Functional Classification of Metalloproteins,” Bioinformatics, vol. 16, pp. 851-864, 2000.
[2] I. Bertini, A. Sigel, and H. Sigel, “Handbook on Metalloproteins,” J. Am. Chemical Soc., vol. 123, no. 50, p. 12748, http:// pubs.acs.org/doi/abs/10.1021ja015322x , 2001.
[3] J. Muller, “Functional Metal Ions in Nucleic Acids,” Metallomics, vol. 2, no. 5, pp. 318-327, http://dx.doi.org/10.1039c000429d, 2010.
[4] D.E. Draper, “A Guide to Ions and RNA Structure,” RNA, vol. 10, no. 3, pp. 335-343, http://www.biomedsearch.com/nih/guide-to-ions-RNA-structure 14970378.html, 2004.
[5] E. Freisinger and R.K. Sigel, “From Nucleotides to Ribozymes-A Comparison of Their Metal Ion Binding Properties,” Coordination Chemistry Rev., vol. 251, nos. 13/14, pp. 1834-1851, http://www.sciencedirect.com/science/article/ B6TFW-4N9DK1X-1/26566239afcd0cc18464374985dba075a , 2007.
[6] K.J. Barnham and A.I. Bush, “Metals in Alzheimer's and Parkinson's Diseases,” Current Opinion in Chemical Biology, vol. 12, no. 2, pp. 222-228, http://www.sciencedirect.com/science/article/ B6VRX-4S9R226-2/2730f58d2a562f3703d907abd8fd3cfa0 , 2008.
[7] D. Beyersmann and S. Hechtenberg, “Cadmium, Gene Regulation, and Cellular Signalling in Mammalian Cells,” Toxicology and Applied Pharmacology, vol. 144, no. 2, pp. 247-261, http://www.sciencedirect.com/science/article/ B6WXH-45KN3J7-5/2dc811bb0 fb069b66bc3a1148f1f5bbae , 1997.
[8] W. Shi, C. Zhan, A. Ignatov, B.A. Manjasetty, N. Marinkovic, M. Sullivan, R. Huang, and M.R. Chance, “Metalloproteomics: High-Throughput Structural and Functional Annotation of Proteins in Structural Genomics,” Structure, vol. 13, no. 10, pp. 1473-1486, http://www.sciencedirect.com/science/article/ B6VSR-4H9GRCF-D/261c1815e c8906707b782983bd4ac5f90 , 2005.
[9] M.R. Chance and W. Shi, “Metallomics and Metalloproteomics,” Cellular and Molecular Life Sciences, vol. 65, no. 19, pp. 3040-3048, 2008.
[10] W. Shi, M. Punta, J. Bohon, J.M. Sauder, R. D'Mello, M. Sullivan, J. Toomey, D. Abel, M. Lippi, A. Passerini, P. Frasconi, S.K. Burley, B. Rost, and M.R. Chance, “Characterization of Metalloproteins by High-Throughput X-Ray Absorption Spectroscopy,” Genome Research, vol. 21, no. 6, pp. 898-907, http://www.ncbi.nlm.nih. gov/pubmed21482623 , Apr. 2011.
[11] C. Andreini, I. Bertini, and A. Rosato, “A Hint to Search for Metalloproteins in Gene Banks,” Bioinformatics, vol. 20, no. 9, pp. 1373-1380, 2004.
[12] F. Ferrè and P. Clote, “DiANNA 1.1: An Extension of the DiANNA Web Server for Ternary Cysteine Classification,” Nucleic Acids Research, vol. 34, pp. W182-W185, 2006.
[13] A. Passerini, M. Punta, A. Ceroni, B. Rost, and P. Frasconi, “Identifying Cysteines and Histidines in Transition-Metal-Binding Sites Using Support Vector Machines and Neural Networks,” Proteins, vol. 65, no. 2, pp. 305-316, 2006.
[14] A. Passerini, C. Andreini, S. Menchetti, A. Rosato, and P. Frasconi, “Predicting Zinc Binding at the Proteome Level,” BMC Bioinformatics, vol. 8, p. 39, 2007.
[15] N. Shu, T. Zhou, and S. Hovmoller, “Prediction of Zinc-Binding Sites in Proteins from Sequence,” Bioinformatics, vol. 24, no. 6, pp. 775-782, 2008.
[16] A.J. Bordner, “Predicting Small Ligand Binding Sites in Proteins Using Backbone Structure,” Bioinformatics, vol. 24, no. 24, pp. 2865-2871, Dec. 2008.
[17] I. Bertini and G. Cavallaro, “Bioinformatics in Bioinorganic Chemistry,” Metallomics, vol. 2, pp. 39-51, http://dx.doi.org/10.1039B912156K, 2010.
[18] M. Babor, S. Gerzon, B. Raveh, V. Sobolev, and M. Edelman, “Prediction of Transition Metal-Binding Sites from Apo Protein Structures,” Proteins, vol. 70, no. 1, pp. 208-217, 2007.
[19] J.C. Ebert and R.B. Altman, “Robust Recognition of Zinc Binding Sites in Proteins,” Protein Science, vol. 17, no. 1, pp. 54-65, http://www.biomedsearch.com/nih/Robust-recognition-zinc-binding-sites 18042678.html , 2008.
[20] J.A. Capra, R.A. Laskowski, J.M. Thornton, M. Singh, and T.A. Funkhouser, “Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure,” PLoS Computational Biology, vol. 5, no. 12, p. e1000585, Dec. 2009.
[21] G. Bartlett, C. Porter, N. Borkakoti, and J. Thornton, “Analysis of Catalytic Residues in Enzyme Active Sites,” J. Molecular Biology, vol. 324, no. 1, pp. 105-121, 2002.
[22] C. Andreini, L. Banci, I. Bertini, and A. Rosato, “Counting the Zinc-Proteins Encoded in the Human Genome,” J. Proteome Research, vol. 5, no. 1, pp. 196-201, http://pubs.acs.org/doi/abs/10.1021pr050361j , 2006.
[23] C. Andreini, L. Banci, I. Bertini, and A. Rosato, “Occurrence of Copper Proteins through the Three Domains of Life: A Bioinformatic Approach,” J. Proteome Research, vol. 7, no. 1, pp. 209-216, http://pubs.acs.org/doi/abs/10.1021pr070480u , 2008.
[24] N. Hulo, A. Bairoch, V. Bulliard, L. Cerutti, E. De Castro, P.S. Langendijk-Genevaux, M. Pagni, and C.J.A. Sigrist, “The PROSITE Database,” Nucleic Acids Research, vol. 34, no. suppl. 1, pp. D227-D230, http://nar.oxfordjournals.org/cgi/content/ abstract/34/suppl_1D227, 2006.
[25] G.H. Bakir, T. Hofmann, B. Schölkopf, A.J. Smola, B. Taskar, and S.V.N. Vishwanathan, Predicting Structured Data. The MIT Press, 2007.
[26] A. Vullo and P. Frasconi, “Disulfide Connectivity Prediction Using Recursive Neural Networks and Evolutionary Information,” Bioinformatics, vol. 20, no. 5, pp. 653-659, http://bioinformatics. oxfordjournals.org/ cgi/content/abstractbtg463v1, 2004.
[27] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, “Large Margin Methods for Structured and Interdependent Output Variables,” J. Machine Learning Research, vol. 6, pp. 1453-1484, 2005.
[28] P. Frasconi and A. Passerini, “Predicting the Geometry of Metal Binding Sites from Protein Sequence,” Proc. Neural Information Processing Systems (NIPS), pp. 465-472, 2008.
[29] B. Taskar, V. Chatalbashev, D. Koller, and C. Guestrin, “Learning Structured Prediction Models: A Large Margin Approach,” Proc. Int'l Conf. Machine Learning (ICML '05), pp. 896-903, 2005.
[30] E. Lawler, Combinatorial Optimization: Networks and Matroids. Holt, Rinehart, and Winston, 1976.
[31] P. Helman, B.M.E. Moret, and H.D. Shapiro, “An Exact Characterization of Greedy Structures,” SIAM J. Discrete Math., vol. 6, pp. 274-283, 1993.
[32] H. DauméIII and D. Marcu, “Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction,” Proc. Int'l Conf. Machine Learning (ICML '05), pp. 169-176, 2005.
[33] A. Bordes, S. Ertekin, J. Weston, and L. Bottou, “Fast Kernel Classifiers with Online and Active Learning,” J. Machine Learning Research, vol. 6, pp. 1579-1619, 2005.
[34] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge Univ. Press, http://www.loc.gov/catdir/toc/cam0512003069590.html , 2004.
[35] J.C. Gower, “A General Coefficient of Similarity and Some of Its Properties,” Biometrics, vol. 27, no. 4, pp. 857-871, 1971.
[36] J. Bentley, D. Sleator, R. Tarjan, and V. Wei, “A Locally Adaptive Data Compression Scheme,” Comm. ACM, vol. 29, no. 4, pp. 320-330, 1986.
[37] C. Leslie, E. Eskin, and W. Noble, “The Spectrum Kernel: A String Kernel for Svm Protein Classification,” Proc. Pacific Symp. Biocomputing, pp. 564-575, 2002.
[38] M. Lippi, A. Passerini, M. Punta, B. Rost, and P. Frasconi, “MetalDetector: A Web Server for Predicting Metal-Binding Sites and Disulfide Bridges in Proteins from Sequence,” Bioinformatics, vol. 24, no. 18, pp. 2094-2095, 2008.
[39] S. Mika and B. Rost, “Uniqueprot: Creating Representative Protein Sequence Sets,” Nucleic Acids Research, vol. 31, no. 13, pp. 3789-3791, http://www.rostlab.org/papers2003_narweb_unique /, 2003.
[40] A.G. Murzin, S.E. Brenner, T. Hubbard, and C. Chothia, “Scop: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures,” J. Molecular Biology, vol. 247, no. 4, pp. 536-540, Apr. 1995.
[41] W. Li and A. Godzik, “Cd-Hit: A Fast Program for Clustering and Comparing Large Sets of Protein or Nucleotide Sequences,” Bioinformatics, vol. 22, no. 13, pp. 1658-1659, July 2006.
7 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool