The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - Sept.-Oct. (2012 vol.9)
pp: 1492-1503
John C. Hawkins , Struct. Bioinf., Tech. Univ. Dresden, Dresden, Germany
Hongbo Zhu , Struct. Bioinf., Tech. Univ. Dresden, Dresden, Germany
Joan Teyra , Struct. Bioinf., Tech. Univ. Dresden, Dresden, Germany
M. Teresa Pisabarro , Struct. Bioinf., Tech. Univ. Dresden, Dresden, Germany
ABSTRACT
Identifying the binding partners of proteins is a problem of fundamental importance in computational biology. The PDZ is one of the most common and well-studied protein binding domains, hence it is a perfect model system for designing protein binding predictors. The standard approach to identifying the binding partners of PDZ domains uses multiple sequence alignments to infer the set of contact residues that are used in a predictive model. We expand on the sequence alignment approach by incorporating structural information to generate descriptors of the binding site geometry. Furthermore, we generate a real-value score for binary predictions by applying a filter based on models that predict the probability distributions of contact residues at each of the canonical PDZ ligand binding positions. Under training cross validation, our model produced an order of magnitude more predictions at a false positive proportion (FPP) of 10 percent than our benchmark model chosen from the literature. Evaluated using an independent cross validation, with computationally predicted structures, our model was able to make five times as many predictions as the benchmark model, with a Matthews' correlation coefficient (MCC) of 0.33. In addition, our model achieved a false positive proportion of 0.14, while the benchmark model had a 0.25 false positive proportion.
INDEX TERMS
proteins, biological techniques, molecular biophysics, probability, Matthew correlation coefficient, reduced false positive proportion, PDZ binding prediction, structural descriptors, sequence descriptors, computational biology, protein binding domains, multiple sequence alignments, structural information, binding site geometry, probability distributions, benchmark model, Peptides, Proteins, Encoding, Predictive models, Probability distribution, Computational modeling, Data models, protein structure classification., PDZ binding, protein binding prediction, machine learning
CITATION
John C. Hawkins, Hongbo Zhu, Joan Teyra, M. Teresa Pisabarro, "Reduced False Positives in PDZ Binding Prediction Using Sequence and Structural Descriptors", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 5, pp. 1492-1503, Sept.-Oct. 2012, doi:10.1109/TCBB.2012.54
REFERENCES
[1] M. Sheng and C. Sala, "PDZ Domains and the Organization of Supramolecular Complexes," Ann. Rev. Neuroscience, vol. 24, pp. 1-29, http://dx.doi.org/10.1146annurev.neuro.24.1.1 , 2001.
[2] B.J. Hillier, K.S. Christopherson, K.E. Prehoda, D.S. Bredt, and W.A. Lim, "Unexpected Modes of PDZ Domain Scaffolding Revealed by Structure of nNOS-Syntrophin Complex," Science, vol. 284, no. 5415, pp. 812-815, http://dx.doi.org/10.1126science.284.5415.812 , Apr. 1999.
[3] R.R. Penkert, H.M. DiVittorio, and K.E. Prehoda, "Internal Recognition through PDZ Domain Plasticity in the Par-6-Pals1 Complex," Nature Structural and Molecular Biology, vol. 11, no. 11, pp. 1122-1127, http://dx.doi.org/10.1038nsmb839, Nov. 2004.
[4] N. Lenfant, J. Polanowska, S. Bamps, S. Omi, J.-P.P. Borg, and J. Reboul, "A Genome-Wide Study of PDZ-Domain Interactions in C. Elegans Reveals a High Frequency of Non-Canonical Binding," BMC Genomics, vol. 11, p. 671, http://dx.doi.org/10.11861471-2164-11-671 , 2010.
[5] I. Bezprozvanny and A. Maximov, "Classification of PDZ Domains," FEBS Letters, vol. 509, no. 3, pp. 457-462, http://dx.doi.org/10.1016S0014-5793(01)03214-8 , Dec. 2001.
[6] P. Vaccaro and L. Dente, "PDZ Domains: Troubles in Classification," FEBS Letters, vol. 512, nos. 1-3, pp. 345/346, http://dx.doi.org/10.1016S0014-5793(02)02220-2 , Feb. 2002.
[7] R. Tonikian, Y. Zhang, S.L. Sazinsky, B. Currell, J.-H. Yeh, B. Reva, H.A. Held, B.A. Appleton, M. Evangelista, Y. Wu, X. Xin, A.C. Chan, S. Seshagiri, L.A. Lasky, C. Sander, C. Boone, G.D. Bader, and S.S. Sidhu, "A Specificity Map for the PDZ Domain Family," PLoS Biology, vol. 6, no. 9, p. e239, http://dx.doi.org/10.1371journal.pbio.0060239 , Sept. 2008.
[8] M.A. Stiffler, J.R. Chen, V.P. Grantcharova, Y. Lei, D. Fuchs, J.E. Allen, L.A. Zaslavskaia, and G. MacBeath, "PDZ Domain Binding Selectivity is Optimized Across the Mouse Proteome," Science, vol. 317, no. 5836, pp. 364-369, http://dx.doi.org/10.1126science.1144592 , July 2007.
[9] H.J. Lee and J. Zheng, "PDZ Domains and Their Binding Partners: Structure, Specificity, and Modification," Cell Comm. and Signaling: CCS, vol. 8, no. 1, pp. 8-18, http://dx.doi.org/10.11861478-811X-8-8, May 2010.
[10] L. Zhang, C. Shao, D. Zheng, and Y. Gao, "An Integrated Machine Learning System to Computationally Screen Protein Databases for Protein Binding Peptide Ligands," Molecular and Cellular Proteomics, vol. 5, no. 7, pp. 1224-1232, http://dx.doi.org/10.1074mcp.M500346-MCP200 , July 2006.
[11] J.R. Chen, B.H. Chang, J.E. Allen, M.A. Stiffler, and G. MacBeath, "Predicting PDZ Domain-Peptide Interactions from Primary Sequences," Nature Biotechnology, vol. 26, no. 9, pp. 1041-1045, http://dx.doi.org/10.1038nbt.1489, Aug. 2008.
[12] C. Schillinger, P. Boisguerin, and G. Krause, "Domain Interaction Footprint: A Multi-Classification Approach to Predict Domain-Peptide Interactions," Bioinformatics, vol. 25, no. 13, pp. 1632-1639, http://dx.doi.org/10.1093/bioinformatics btp264, July 2009.
[13] S. Kalyoncu, O. Keskin, and A. Gursoy, "Interaction Prediction and Classification of PDZ Domains," BMC Bioinformatics, vol. 11, no. 1,article no. 357, http://dx.doi.org/10.11861471-2105-11-357 , 2010.
[14] X. Shao, C.S.H. Tan, C. Voss, S.S.C. Li, N. Deng, and G.D. Bader, "A Regression Framework Incorporating Quantitative and Negative Interaction Data Improves Quantitative Prediction of PDZ Domain Peptide Interaction from Primary Sequence," Bioinformatics, vol. 27, no. 3, pp. 383-390, http://dx.doi.org/10.1093/bioinformatics btq657, Feb. 2011.
[15] T. Hertz and C. Yanover, "PepDist: A New Framework for Protein-Peptide Binding Prediction Based on Learning Peptide Distance Functions," BMC Bioinformatics, vol. 7, no. Suppl 1, article no. S3, http://dx.doi.org/10.11861471-2105-7-S1-S3 , 2006.
[16] E. Ferraro, A. Via, G. Ausiello, and M. Helmer-Citterich, "A Novel Structure-Based Encoding for Machine-Learning Applied to the Inference of SH3 Domain Specificity," Bioinformatics, vol. 22, no. 19, pp. 2333-2339, http://dx.doi.org/10.1093/bioinformatics btl403, Oct. 2006.
[17] B. Brannetti, A. Zanzoni, L. Montecchi-Palazzi, G. Cesareni, and M. Helmer-Citterich, "iSPOT: A Web Tool for the Analysis and Recognition of Protein Domain Specificity," Comparative and Functional Genomics, vol. 2, no. 5, pp. 314-318, http://dx.doi.org/10.1002cfg.104, 2001.
[18] F. Ferrè, A. Via, G. Ausiello, B. Brannetti, A. Zanzoni, and M. Helmer-Citterich, "Development of Computational Tools for the Inference of Protein Interaction Specificity Rules and Functional Annotation Using Structural Information," Comparative and Functional Genomics, vol. 4, no. 4, pp. 416-419, http://dx.doi.org/10.1002cfg.304, 2003.
[19] O. O'Sullivan, K. Suhre, C. Abergel, D.G. Higgins, and C. Notredame, "3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments," J. Molecular Biology, vol. 340, no. 2, pp. 385-395, http://dx.doi.org/10.1016j.jmb.2004.04.058 , July 2004.
[20] A.M. Wollacott and J.R. Desjarlais, "Virtual Interaction Profiles of Proteins," J. Molecular Biology, vol. 313, no. 2, pp. 317-342, http://dx.doi.org/10.1006jmbi.2001.5035, Oct. 2001.
[21] C.A. Smith and T. Kortemme, "Structure-Based Prediction of the Peptide Sequence Space Recognized by Natural and Synthetic PDZ Domains," J. Molecular Biology, vol. 402, no. 2, pp. 460-474, http://dx.doi.org/10.1016j.jmb.2010.07.032 , Sept. 2010.
[22] Z.N. Gerek and S.B. Ozkan, "A Flexible Docking Scheme to Explore the Binding Selectivity of PDZ Domains," Protein Science: A Publication of the Protein Soc., vol. 19, no. 5, pp. 914-928, http://dx.doi.org/10.1002pro.366, May 2010.
[23] U. Wiedemann, P. Boisguerin, R. Leben, D. Leitner, G. Krause, K. Moelling, R. Volkmer-Engert, and H. Oschkinat, "Quantification of PDZ Domain Specificity, Prediction of Ligand Affinity and Rational Design of Super-Binding Peptides," J. Molecular Biology, vol. 343, no. 3, pp. 703-718, http://dx.doi.org/10.1016j.jmb.2004.08.064 , Oct. 2004.
[24] A.G. Murzin, S.E. Brenner, T. Hubbard, and C. Chothia, "SCOP: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures," J. Molecular Biology, vol. 247, no. 4, pp. 536-540, http://dx.doi.org/10.1006jmbi.1995.0159, Apr. 1995.
[25] G.E. Crooks, G. Hon, J.-M. Chandonia, and S.E. Brenner, "WebLogo: A Sequence Logo Generator," Genome Research, vol. 14, no. 6, pp. 1188-1190, http://dx.doi.org/10.1101gr.849004, June 2004.
[26] M.A. Larkin, G. Blackshields, N.P. Brown, R. Chenna, P.A. McGettigan, H. McWilliam, F. Valentin, I.M. Wallace, A. Wilm, R. Lopez, J.D. Thompson, T.J. Gibson, and D.G. Higgins, "Clustal W and Clustal X Version 2.0," Bioinformatics, vol. 23, no. 21, pp. 2947-2948, http://dx.doi.org/10.1093/bioinformatics btm404, Nov. 2007.
[27] E. Krissinel and K. Henrick, "Secondary-Structure Matching (SSM), a New Tool for Fast Protein Structure Alignment in Three Dimensions," Acta Crystallographica. Section D, Biological Crystallography, vol. 60, no. Pt 12 Pt 1, pp. 2256-2268, http://dx.doi.org/10.1107S0907444904026460 , Dec. 2004.
[28] T. Beuming, L. Skrabanek, M.Y. Niv, P. Mukherjee, and H. Weinstein, "PDZBase: A Protein-Protein Interaction Database for PDZ-Domains," Bioinformatics, vol. 21, no. 6, pp. 827-828, http://dx.doi.org/10.1093/bioinformatics bti098, Mar. 2005.
[29] H.-S. Eo, S. Kim, H. Koo, and W. Kim, "A Machine Learning Based Method for the Prediction of G Protein-Coupled Receptor-Binding Pdz Domain Proteins," Molecules and Cells, vol. 27, no. 6, pp. 629-634, http://dx.doi.org/10.1007s10059-009-0091-2 , June 2009.
[30] S. Kawashima and M. Kanehisa, "AAindex: Amino Acid Index Database," Nucleic Acids Research, vol. 28, no. 1,http://dx.doi.org/10.1093/nar28.1.374, p. 374, Jan. 2000.
[31] S. Maetschke, M. Towsey, and M. Bodén, "BLOMAP: An Encoding of Amino Acids Which Improves Signal Peptide Cleavage Prediction," Proc. Third Asia-Pacific Bioinformatics Conf., pp. 141-150, 2005.
[32] W.R. Atchley, J. Zhao, A.D. Fernandes, and T. Drüke, "Solving the Protein Sequence Metric Problem," Proc. Nat'l Academy of Sciences USA, vol. 102, no. 18, pp. 6395-6400, http://dx.doi.org/10.1073pnas.0408677102 , May 2005.
[33] I. Sommer, O. Muller, F.S. Domingues, O. Sander, J. Weickert, and T. Lengauer, "Moment Invariants as Shape Recognition Technique for Comparing Protein Binding Sites," Bioinformatics, vol. 23, no. 23, pp. 3139-3146, http://dx.doi.org/10.1093/bioinformatics btm503, Dec. 2007.
[34] M.F. Sanner, A.J. Olson, and J.C. Spehner, "Fast and Robust Computation of Molecular Surfaces," Proc. ACM 11th Ann. Symp. Computational Geometry, pp. 406-407, http://dx.doi.org/10.1145220279.220324, 1995.
[35] R.J. Morris, R.J. Najmanovich, A. Kahraman, and J.M. Thornton, "Real Spherical Harmonic Expansion Coefficients as 3D Shape Descriptors for Protein Binding Pocket and Ligand Comparisons," Bioinformatics, vol. 21, no. 10, pp. 2347-2355, http://dx.doi.org/10.1093/bioinformatics bti337, May 2005.
[36] L. Sael, D. La, B. Li, R. Rustamov, and D. Kihara, "Rapid Comparison of Properties on Protein Surface," Proteins, vol. 73, no. 1, pp. 1-10, http://dx.doi.org/10.1002prot.22141, Oct. 2008.
[37] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten, "The WEKA Data Mining Software: An Update," ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10-18, http://dx.doi.org/10.11451656274.1656278 , Nov. 2009.
[38] I. Dondoshansky and Y. Wolf BLASTCLUST - BLAST Score-Based Single-Linkage Clustering, ftp://ftp.ncbi.nih.gov/blast/ documentsblastclust.html , 2000.
[39] B.W. Matthews,, "Comparison of Predicted and Observed Secondary Structure of t4 Phage Lysozyme," Biochimica et Biophysica Acta, vol. 405, pp. 442-451, 1975.
[40] K.A. Spackman, "Signal Detection Theory: Valuable Tools for Evaluating Inductive Learning," Proc. Sixth Int'l Workshop Machine Learning, pp. 160-163, http://portal.acm.orgcitation.cfm? id=102118.102172 , 1989.
[41] A.B. Robinson and L.R. Robinson, "Distribution of Glutamine and Asparagine Residues and Their Near Neighbors in Peptides and Proteins," Proc. Nat'l Academy of Sciences USA, vol. 88, no. 20, pp. 8880-8884, http://www.pnas.org/content/88/208880.abstract , Oct. 1991.
[42] S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D.J. Lipman, "Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs," Nucleic Acids Research, vol. 25, no. 17, pp. 3389-3402, http://dx.doi.org/10.1093/nar25.17.3389, Sept. 1997.
[43] E. Zaslavsky, P. Bradley, and C. Yanover, "Inferring PDZ Domain Multi-Mutant Binding Preferences from Single-Mutant Data," PLoS One, vol. 5, no. 9, p. e12787, http://dx.doi.org/10.1371journal.pone.0012787 , Sept. 2010.
[44] N. Eswar, B. Webb, M.A. Marti-Renom, M.S. Madhusudhan, D. Eramian, M.Y. Shen, U. Pieper, and A. Sali, "Comparative Protein Structure Modeling Using MODELLER," Current Protocols in Protein Science, Chapter 2: Unit 2.9, Nov. 2007.
50 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool