Subscribe
Issue No.05 - Sept.-Oct. (2013 vol.10)
pp: 1176-1187
Yao-ming Huang , Dept. of Bioeng. & Therapeutive Sci., Univ. of California, San Francisco, San Francisco, CA, USA
Christopher Bystroff , Rensselaer Polytech. Inst., Troy, NY, USA
ABSTRACT
Nature possesses a secret formula for the energy as a function of the structure of a protein. In protein design, approximations are made to both the structural representation of the molecule and to the form of the energy equation, such that the existence of a general energy function for proteins is by no means guaranteed. Here, we present new insights toward the application of machine learning to the problem of finding a general energy function for protein design. Machine learning requires the definition of an objective function, which carries with it the implied definition of success in protein design. We explored four functions, consisting of two functional forms, each with two criteria for success. Optimization was carried out by a Monte Carlo search through the space of all variable parameters. Cross-validation of the optimized energy function against a test set gave significantly different results depending on the choice of objective function, pointing to relative correctness of the built-in assumptions. Novel energy cross terms correct for the observed nonadditivity of energy terms and an imbalance in the distribution of predicted amino acids. This paper expands on the work presented at the 2012 ACM-BCB.
INDEX TERMS
Proteins, Linear programming, Amino acids, Hydrogen, Bonding, Optimization,dead-end elimination, Biology and genetics, physics, chemistry, protein design, energy function, machine learning, correlation, rotamers
CITATION
Yao-ming Huang, Christopher Bystroff, "Expanded Explorations into the Optimization of an Energy Function for Protein Design", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 5, pp. 1176-1187, Sept.-Oct. 2013, doi:10.1109/TCBB.2013.113
REFERENCES
 [1] S. Park, X. Yang, and J.G. Saven, "Advances in Computational Protein Design," Current Opinion in Structural Biology, vol. 14, no. 4, pp. 487-494, 2004. [2] O. Schueler-Furman et al., "Progress in Modeling of Protein Structures and Interactions," Science, vol. 310, no. 5748, pp. 638-642, 2005. [3] N. Pokala and T.M. Handel, "Review: Protein Design-Where We Were, Where We Are, Where We're Going," J. Structural Biology, vol. 134, nos. 2/3, pp. 269-281, 2001. [4] B.I. Dahiyat and S.L. Mayo, "De Novo Protein Design: Fully Automated Sequence Selection," Science, vol. 278, no. 5335, pp. 82-87, 1997. [5] J.R. Desjarlais and T.M. Handel, "De Novo Design of the Hydrophobic Cores of Proteins," Protein Science, vol. 4, no. 10, pp. 2006-2018, 1995. [6] S.M. Malakauskas and S.L. Mayo, "Design, Structure and Stability of a Hyperthermophilic Protein Variant," Nature Structural Biology, vol. 5, no. 6, pp. 470-475, 1998. [7] B.S. Chevalier et al., "Design, Activity, and Structure of a Highly Specific Artificial Endonuclease," Molecular Cell, vol. 10, no. 4, pp. 895-905, 2002. [8] L. Jiang et al., "De Novo Computational Design of Retro-Aldol Enzymes," Science, vol. 319, no. 5868, pp. 1387-1391, 2008. [9] D. Rothlisberger et al., "Kemp Elimination Catalysts by Computational Enzyme Design," Nature, vol. 453, pp. 190-195, 2008. [10] J.K. Lassila et al., "Computationally Designed Variants of Escherichia Coli Chorismate Mutase Show Altered Catalytic Activity," Protein Eng. Design & Selection, vol. 18, no. 4, pp. 161-163, 2005. [11] M. Allert et al., "Computational Design of Receptors for an Organophosphate Surrogate of the Nerve Agent Soman," Proc. Nat'l Academy of Sciences USA, vol. 101, no. 21, pp. 7907-7912, 2004. [12] L.L. Looger et al., "Computational Design of Receptor and Sensor Proteins with Novel Functions," Nature, vol. 423, no. 6936, pp. 185-190, 2003. [13] M.A. Dwyer, L.L. Looger, and H.W. Hellinga, "Computational Design of a Biologically Active Enzyme," Science, vol. 304, no. 5679, pp. 1967-1971, 2004. [14] J. Reina et al., "Computer-Aided Design of a PDZ Domain to Recognize New Target Sequences," Nature Structural Biology, vol. 9, no. 8, pp. 621-627, 2002. [15] G. Song et al., "Rational Design of Intercellular Adhesion molecule-1 (ICAM-1) Variants for Antagonizing Integrin Lymphocyte Function-Associated Antigen-1-Dependent Adhesion," J. Biological Chemistry, vol. 281, no. 8, pp. 5042-5049, 2006. [16] S.M. Lippow, K.D. Wittrup, and B. Tidor, "Computational Design of Antibody-Affinity Improvement Beyond In Vivo Maturation," Nature Biotechnology, vol. 25, no. 10, pp. 1171-1176, 2007. [17] B. Kuhlman et al., "Design of a Novel Globular Protein Fold with Atomic-Level Accuracy," Science, vol. 302, no. 5649, pp. 1364-1368, 2003. [18] B. Kuhlman and D. Baker, "Native Protein Sequences Are Close to Optimal for Their Structures," Proc. Nat'l Academy of Sciences USA, vol. 97, no. 19, pp. 10383-10388, 2000. [19] R.L. DunbrackJr. and F.E. Cohen, "Bayesian Statistical Analysis of Protein Side-Chain Rotamer Preferences," Protein Science, vol. 6, no. 8, pp. 1661-1681, 1997. [20] R.L. Dunbrack Jr and M. Karplus, "Backbone-Dependent Rotamer Library for Proteins. Application to Side-Chain Prediction," J. Molecular Biology, vol, 230, no. 2, pp, 543-574, 1993. [21] S.C. Lovell et al., "The Penultimate Rotamer Library," Proteins, vol. 40, no. 3, pp. 389-408, 2000. [22] B.D. Allen and S.L. Mayo, "An Efficient Algorithm for Multistate Protein Design Based on FASTER," J. Computational Chemistry, vol. 31, no. 5, pp. 904-916, 2010. [23] E.L. Humphris and T. Kortemme, "Prediction of Protein-Protein Interface Sequence Diversity Using Flexible Backbone Computational Protein Design," Structure, vol. 16, no. 12, pp. 1777-1788, 2008. [24] C.T. Saunders and D. Baker, "Recapitulation of Protein Family Divergence Using Flexible Backbone Protein Design," J. Molecular Biology, vol. 346, no. 2, pp. 631-644, 2005. [25] J. Desmet et al., "The Dead-End Elimination Theorem and Its Use in Protein Side-Chain Positioning," Nature, vol. 356, no. 6369, pp. 539-542, 1992. [26] R.F. Goldstein, "Efficient Rotamer Elimination Applied to Protein Side-Chains and Related Spin Glasses," Biophysical J., vol. 66, no. 5, pp. 1335-1340, 1994. [27] D.B. Gordon and S.L. Mayo, "Radical Performance Enhancements for Combinatorial Optimization Algorithms Based on the Dead-End Elimination Theorem," J. Computational Chemistry, vol. 19, no. 13, pp. 1505-1514, 1998. [28] D.A. Keller et al., "Finding the Global Minimum: A Fuzzy End Elimination Implementation," Protein Eng, vol. 8, no. 9, pp. 893-904, 1995. [29] L.L. Looger and H.W. Hellinga, "Generalized Dead-End Elimination Algorithms Make Large-Scale Protein Side-Chain Structure Prediction Tractable: Implications for Protein Design and Structural Genomics," J. Molecular Biology, vol. 307, no. 1, pp. 429-445, 2001. [30] N.A. Pierce et al., "Conformational Splitting: A More Powerful Criterion for Dead-End Elimination," J. Computational Chemistry, vol. 21, no. 11, pp. 999-1009, 2000. [31] H.W. Hellinga and F.M. Richards, "Optimal Sequence Selection in Proteins of Known Structure by Simulated Evolution," Proc. Nat'l Academy of Sciences USA, vol. 91, no. 13, pp. 5803-5807, 1994. [32] C. Lee and M. Levitt, "Accurate Prediction of the Stability and Activity Effects of Site-Directed Mutagenesis on a Protein Core," Nature, vol. 352, no. 6334, pp. 448-451, 1991. [33] X. Jiang et al., "A New Approach to the Design of Uniquely Folded Thermally Stable Proteins," Protein Science, vol. 9, no. 2, pp. 403-416, 2000. [34] D.T. Jones, "De Novo Protein Design Using Pairwise Potentials and a Genetic Algorithm," Protein Science, vol. 3, no. 4, pp. 567-574, 1994. [35] D.B. Gordon and S.L. Mayo, "Branch-and-Terminate: A Combinatorial Optimization Algorithm for Protein Design," Structure, vol. 7, no. 9, pp. 1089-1098, 1999. [36] L. Wernisch, S. Hery, and S.J. Wodak, "Automatic Protein Design with All Atom Force-Fields by Exact and Heuristic Optimization," J. Molecular Biology, vol. 301, no. 3, pp. 713-736, 2000. [37] A.R. Leach and A.P. Lemon, "Exploring the Conformational Space of Protein Side Chains Using Dead-End Elimination and the $\rm A^\ast$ Algorithm," Proteins, vol. 33, no. 2, pp. 227-239, 1998. [38] I.W. Davis et al., "The Backrub Motion: How Protein Backbone Shrugs When a Sidechain Dances," Structure, vol. 14, no. 2, pp. 265-274, 2006. [39] I. Georgiev et al., "Algorithm for Backrub Motions in Protein Design," Bioinformatics, vol. 24, no. 13, pp. i196-i204, 2008. [40] C.A. Smith and T. Kortemme, "Backrub-Like Backbone Simulation Recapitulates Natural Protein Conformational Variability and Improves Mutant Side-Chain Prediction," J. Molecular Biology, vol. 380, no. 4, pp. 742-456, 2008. [41] D.J. Mandell, E.A. Coutsias, and T. Kortemme, "Sub-Angstrom Accuracy in Protein Loop Reconstruction by Robotics-Inspired Conformational Sampling," Nature Methods, vol. 6, no. 8, pp. 551-552, 2009. [42] A.L. Cuff and A.C. Martin, "Analysis of Void Volumes in Proteins and Application to Stability of the p53 Tumour Suppressor Protein," J. Molecular Biology, vol. 344, no. 5, pp. 1199-209, 2004. [43] Y. Song et al., "Structure-Guided Forcefield Optimization," Proteins, vol. 79, no. 6, pp. 1898-909, 2011. [44] S. Liang and N.V. Grishin, "Effective Scoring Function for Protein Sequence Design," Proteins, vol. 54, no. 2, pp. 271-281, 2004. [45] J.M. Word et al., "Visualizing and Quantifying Molecular Goodness-of-Fit: Small-Probe Contact Dots with Explicit Hydrogen Atoms," J. Molecular Biology, vol. 285, no. 4, pp. 1711-1733, 1999. [46] W.F. van Gunsteren et al., Biomolecular Simulation: The GROMOS 96 Manual and User Guide. Biomos, 1996. [47] U. Hobohm and C. Sander, "Enlarged Representative Set of Protein Structures," Protein Science, vol. 3, no. 3, pp. 522-524, 1994. [48] Y. Iba, "Extended Ensemble Monte Carlo," Int'l J. Modern Physics C: Computational Physics & Physical Computation, vol. 12, no. 5, pp. 623-656, 2001. [49] A.E. Eriksson, W.A. Baase, and B.W. Matthews, "Similar Hydrophobic Replacements of Leu99 and Phe153 within the Core of T4 Lysozyme Have Different Structural and Thermodynamic Consequences," J. Molecular Biology, vol. 229, no. 3, pp. 747-769, 1993. [50] T. Kortemme and D. Baker, "A Simple Physical Model for Binding Energy Hot Spots in Protein-Protein Complexes," Proc. Nat'l Academy of Sciences USA, vol. 99, no. 22, pp. 14116-14121, 2002. [51] F. Ding and N.V. Dokholyan, "Emergence of Protein Fold Families through Rational Design," PLoS Computational Biology, vol. 2, no. 7,article e85, 2006. [52] B.I. Dahiyat and S.L. Mayo, "Probing the Role of Packing Specificity in Protein Design," Proc Nat'l Academy of Sciences USA, vol. 94, no. 19, pp. 10172-10177, 1997. [53] B.I. Dahiyat, D.B. Gordon, and S.L. Mayo, "Automated Design of the Surface Positions of Protein Helices," Protein Science, vol. 6, no. 6, pp. 1333-1337, 1997. [54] A.G. Street and S.L. Mayo, "Pairwise Calculation of Protein Solvent-Accessible Surface Areas," Folding & Design, vol. 3, no. 4, pp. 253-258, 1998. [55] N. Zhang, C. Zeng, and N.S. Wingreen, "Fast Accurate Evaluation of Protein Solvent Exposure," Proteins, vol. 57, no. 3, pp. 565-576, 2004. [56] D. Eisenberg and A.D. McLachlan, "Solvation Energy in Protein Folding and Binding," Nature, vol. 319, no. 6050, pp. 199-203, 1986. [57] B.I. Dahiyat and S.L. Mayo, "Protein Design Automation," Protein Science, vol. 5, no. 5, pp. 895-903, 1996. [58] C. Bystroff, "MASKER: Improved Solvent-Excluded Molecular Surface Area Estimations Using Boolean Maks," Protein Eng., vol. 15, no. 12, pp. 959-965, 2003. [59] C. Yanover et al., "Minimizing and Learning Energy Functions for Side-Chain Prediction," J. Computational Biology, vol. 15, no. 7, pp. 899-911, 2008. [60] O. Sharabi et al., "Optimizing Energy Functions for Protein-Protein Interface Design," J. Computational Chemistry, vol. 32, no. 1, pp. 23-32, 2011. [61] A. Leaver-Fay et al., "Scientific Benchmarks for Guiding Macromolecular Energy Function Improvement," Methods in Enzymology, vol. 523, pp. 109-143, 2013.