The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - July/August (2011 vol.8)
pp: 1017-1028
Pietro Di Lena , University of Bologna, Bologna
Piero Fariselli , University of Bologna, Bologna
Luciano Margara , University of Bologna, Bologna
Marco Vassura , University of Bologna, Bologna
Rita Casadio , University of Bologna, Bologna
ABSTRACT
Correlated mutations in proteins are believed to occur in order to preserve the protein functional folding through evolution. Their values can be deduced from sequence and/or structural alignments and are indicative of residue contacts in the protein three-dimensional structure. A correlation among pairs of residues is routinely evaluated with the Pearson correlation coefficient and the MCLACHLAN similarity matrix. In literature, there is no justification for the adoption of the MCLACHLAN instead of other substitution matrices. In this paper, we approach the problem of computing the optimal similarity matrix for contact prediction with correlated mutations, i.e., the similarity matrix that maximizes the accuracy of contact prediction with correlated mutations. We describe an optimization procedure, based on the gradient descent method, for computing the optimal similarity matrix and perform an extensive number of experimental tests. Our tests show that there is a large number of optimal matrices that perform similarly to MCLACHLAN. We also obtain that the upper limit to the accuracy achievable in protein contact prediction is independent of the optimized similarity matrix. This suggests that the poor scoring of the correlated mutations approach may be due to the choice of the linear correlation function in evaluating correlated mutations.
INDEX TERMS
Protein contact prediction, correlated mutations, similarity matrix.
CITATION
Pietro Di Lena, Piero Fariselli, Luciano Margara, Marco Vassura, Rita Casadio, "Is There an Optimal Substitution Matrix for Contact Prediction with Correlated Mutations?", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 4, pp. 1017-1028, July/August 2011, doi:10.1109/TCBB.2010.91
REFERENCES
[1] S.F. Altschul, T.L. Madden, A.A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D.J. Lipman, “Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389-3402, Sept. 1997.
[2] A. Andreeva, D. Howorth, S.E. Brenner, T.J. Hubbard, C. Chothia, and A.G. Murzin, “SCOP Database in 2004: Refinements Integrate Structure and Sequence Family Data,” Nucleic Acids Research, vol. 32, pp. 226-229, Jan. 2004.
[3] H. Ashkenazy, R. Unger, and Y. Kliger, “Optimal Data Collection for Correlated Mutation Analysis,” Proteins, vol. 74, no. 3, pp. 545-555, Feb. 2009.
[4] H. Ashkenazy and Y. Kliger, “Reducing Phylogenetic Bias in Correlated Mutation Analysis,” Protein Eng., Design and Selection, vol. 23, no. 5, pp. 321-326, May 2010.
[5] L. Bartoli, P. Fariselli, and R. Casadio, “The Effect of Backbone on the Small-World Properties of Protein Contact Maps,” Physical Biology, vol. 4, no. 4, pp. L1-5, 2008.
[6] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, and P.E. Bourne, “The Protein Data Bank,” Nucleic Acids Research, vol. 28, no. 1, pp. 235-242, Jan. 2000.
[7] R. Das and D. Baker, “Macromolecular Modeling with Rosetta,” Ann. Rev. of Biochemistry, vol. 77, pp. 363-382, 2008.
[8] M.O. Dayhoff, R.M. Schwartz, and B.C. Orcutt, “A Model of Evolutionary Change in Proteins,” Atlas of Protein Sequence and Structure, vol. 5, no. 3, pp. 345-352, 1978.
[9] I. Ezkurdia, O. Graña, J.M. Izarzugaza, and M.L. Tress, “Assessment of Domain Boundary Predictions and the Prediction of Intramolecular Contacts in CASP8,” Proteins, vol. 77, no. 9, pp. 196-209, 2009.
[10] U. Göbel, C. Sander, R. Schneider, and A. Valencia, “Correlated Mutations and Residue Contacts in Proteins,” Proteins, vol. 18, no. 4, pp. 309-317, Apr. 1994.
[11] O. Graña, V.A. Eyrich, F. Pazos, B. Rost, and A. Valencia, “EVAcon: A Protein Contact Prediction Evaluation Service,” Nucleic Acids Research, vol. 33, pp. 347-351, July 2005.
[12] S. Henikoff and J.G. Henikoff, “Amino Acid Substitution Matrices from Protein Blocks,” Proc. Nat'l Academy of Sciences USA, vol. 89, no. 22, pp. 10915-10919, Nov. 1992.
[13] D.A. Hinds and M. Levitt, “A Lattice Model for Protein Structure Prediction at Low Resolution,” Proc. Nat'l Academy of Sciences USA, vol. 89, no. 5, pp. 2536-2540, Apr. 1992.
[14] D.S. Horner, W. Pirovano, and G. Pesole, “Correlated Substitution Analysis and the Prediction of Amino Acid Structural Contacts,” Briefings in Bioinformatics, vol. 9, no. 1, pp. 46-56, Jan. 2008.
[15] A. Lesk, Introduction to Bioinformatics. Oxford Univ. Press, 2006.
[16] A.D. McLachlan, “Tests for Comparing Related Amino-acid Sequences. Cytochrome c and Cytochrome c 551,” J. Molecular Biology, vol. 61, no. 2, pp. 409-424, Oct. 1971.
[17] L. Mirny and E. Domany, “Protein Fold Recognition and Dynamics in the Space of Contact Maps,” Proteins, vol. 26, no. 4, pp. 391-410, 1996.
[18] O. Olmea and A. Valencia, “Improving Contact Predictions by the Combination of Correlated Mutations and Other Sources of Sequence Information,” Folding and Design, vol. 2, no. 3, pp. 25-32, 1997.
[19] F. Pazos, M. Helmer-Citterich, G. Ausiello, and A. Valencia, “Correlated Mutations Contain Information about Protein-Protein Interaction,” J. Molecular Biology, vol. 25, no. 4, pp. 511-523, Aug. 1997.
[20] D.D. Pollock and W.R. Taylor, “Effectiveness of Correlation Analysis in Identifying Protein Residues Undergoing Correlated Evolution,” Protein Eng., vol. 10, no. 6, pp. 647-657, June 1997.
[21] S.A. Samsonov, J. Teyra, G. Anders, and M.T. Pisabarro, “Analysis of the Impact of Solvent on Contacts Prediction in Proteins,” BMC Structural Biology, vol. 9, article no. 22, Apr. 2009.
[22] J.A. Snyman, Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms. Springer-Verlag, 2005.
[23] B.E. Suzek, H. Huang, P. McGarvey, R. Mazumder, and C.H. Wu, “UniRef: Comprehensive and Non-Redundant UniProt Reference Clusters,” Bioinformatics, vol. 23, no. 10, pp. 1282-1288, May 2007.
[24] M. Vassura, L. Margara, P. Di Lena, F. Medri, P. Fariselli, and R. Casadio, “Reconstruction of 3D Structures from Protein Contact Maps,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 5, no. 3, pp. 357-367, July/Sept. 2008.
463 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool