The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - March/April (2011 vol.8)
pp: 476-486
Qiwen Dong , Fudan university, Shanghai
Shuigeng Zhou , Fudan university, Shanghai
ABSTRACT
The prediction of 3D structures of proteins from amino acid sequences is one of the most challenging problems in molecular biology. An essential task for solving this problem with coarse-grained models is to deduce effective interaction potentials. The development and evaluation of new energy functions is critical to accurately modeling the properties of biological macromolecules. Knowledge-based mean force potentials are derived from statistical analysis of proteins of known structures. Current knowledge-based potentials are almost in the form of weighted linear sum of interaction pairs. In this study, a class of novel nonlinear knowledge-based mean force potentials is presented. The potential parameters are obtained by nonlinear classifiers, instead of relative frequencies of interaction pairs against a reference state or linear classifiers. The support vector machine is used to derive the potential parameters on data sets that contain both native structures and decoy structures. Five knowledge-based mean force Boltzmann-based or linear potentials are introduced and their corresponding nonlinear potentials are implemented. They are the DIH potential (single-body residue-level Boltzmann-based potential), the DFIRE-SCM potential (two-body residue-level Boltzmann-based potential), the FS potential (two-body atom-level Boltzmann-based potential), the HR potential (two-body residue-level linear potential), and the T32S3 potential (two-body atom-level linear potential). Experiments are performed on well-established decoy sets, including the LKF data set, the CASP7 data set, and the Decoys “R”Us data set. The evaluation metrics include the energy Z score and the ability of each potential to discriminate native structures from a set of decoy structures. Experimental results show that all nonlinear potentials significantly outperform the corresponding Boltzmann-based or linear potentials, and the proposed discriminative framework is effective in developing knowledge-based mean force potentials. The nonlinear potentials can be widely used for ab initio protein structure prediction, model quality assessment, protein docking, and other challenging problems in computational biology.
INDEX TERMS
Mean force potential, nonlinear potential, protein structure prediction, protein docking.
CITATION
Qiwen Dong, Shuigeng Zhou, "Novel Nonlinear Knowledge-Based Mean Force Potentials Based on Machine Learning", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 2, pp. 476-486, March/April 2011, doi:10.1109/TCBB.2010.86
REFERENCES
[1] C. Micheletti et al., "Learning Effective Amino Acid Interactions through Iterative Stochastic Techniques," Proteins, vol. 42, no. 3, pp. 422-431, 2001.
[2] C.B. Anfinsen, "Principles That Govern the Folding of Protein Chains," Science, vol. 181, no. 96, pp. 223-230, July 1973.
[3] M.J. Sippl, "Knowledge-Based Potentials for Proteins," Current Opinion in Structural Biology, vol. 5, no. 2, pp. 229-235, Apr. 1995.
[4] R. Rajgaria, S.R. McAllister, and C.A. Floudas, "A Novel High Resolution Calpha-Calpha Distance Dependent Force Field Based on a High Quality Decoy Set," Proteins, vol. 65, no. 3, pp. 726-741, Nov. 2006.
[5] T. Lazaridis and M. Karplus, "Effective Energy Functions for Protein Structure Prediction," Current Opinion in Structural Biology, vol. 10, no. 2, pp. 139-145, 2000.
[6] Y. Fujitsuka et al., "Optimizing Physical Energy Functions for Protein Folding," Proteins, vol. 54, no. 1, pp. 88-103, Jan. 2004.
[7] J.-H. Lii and N.L. Allinger, "Directional Hydrogen Bonding in the MM3 Force Field. II.," J. Computational Chemistry, vol. 19, no. 9, pp. 1001-1016, 1998.
[8] Q. Fang and D. Shortle, "Enhanced Sampling Near the Native Conformation Using Statistical Potentials for Local Side-Chain and Backbone Interactions," Proteins, vol. 60, no. 1, pp. 97-102, July 2005.
[9] Q. Fang and D. Shortle, "A Consistent Set of Statistical Potentials for Quantifying Local Side-Chain and Backbone Interactions," Proteins, vol. 60, no. 1, pp. 90-96, July 2005.
[10] C. Loose, J.L. Klepeis, and C.A. Floudas, "A New Pairwise Folding Potential Based on Improved Decoy Generation and Side-Chain Packing," Proteins, vol. 54, no. 2, pp. 303-314, Feb. 2004.
[11] F. Melo, R. Sanchez, and A. Sali, "Statistical Potentials for Fold Assessment," Protein Science, vol. 11, no. 2, pp. 430-448, Feb. 2002.
[12] A.D. MacKerell et al., "All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins," J. Physical Chemistry, vol. 102, no. 18, pp. 3586-3616, 1998.
[13] W.D. Cornell et al., "A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules," J. Am. Chemical Soc., vol. 117, no. 19, pp. 5179-5197, 1995.
[14] M. Christen et al., "The GROMOS Software for Biomolecular Simulation: GROMOS05," J. Computational Chemistry, vol. 26, no. 16, pp. 1719-1751, Dec. 2005.
[15] Y. Duan and P. Kollman, "Pathways to a Protein Folding Intermediate Observed in a 1-Microsecond Simulation in Aqueous Solution," Science, vol. 282, no. 5389, pp. 740-744, 1998.
[16] J.U. Bowie, R. Luthy, and D.A. Eisenberg, "A Method to Identify Protein Sequences That Fold into a Known Three-Dimensional Structure," Science, vol. 253, no. 5016, pp. 164-170, 1991.
[17] K.T. Simons et al., "Assembly of Protein Tertiary Structures from Fragments with Similar Local Sequences Using Simulated Annealing and Bayesian Scoring Functions," J. Molecular Biology, vol. 268, no. 1, pp. 209-225, Apr. 1997.
[18] J. Moult et al., "Critical Assessment of Methods of Protein Structure Prediction (CASP)—Round V," Proteins, vol. 53, no. Suppl. 6, pp. 334-339, 2003.
[19] H. Zhou and Y. Zhou, "Single-Body Residue-Level Knowledge-Based Energy Score Combined with Sequence-Profile and Secondary Structure Information for Fold Recognition," Proteins, vol. 55, no. 4, pp. 1005-1013, June 2004.
[20] D.T. Jones, "GenTHREADER: An Efficient and Reliable Protein Fold Recognition Method for Genomic Sequences," J. Molecular Biology, vol. 287, no. 4, pp. 797-815, Apr. 1999.
[21] D. Eisenberg, R. Luthy, and J.U. Bowie, "VERIFY3D: Assessment of Protein Models with Three-Dimensional Profiles," Methods in Enzymology, vol. 277, pp. 396-404, 1997.
[22] V. Kunin and C.A. Ouzounis, "Clustering the Annotation Space of Proteins," BMC Bioinformatics, vol. 6, article no. 24, Feb. 2005.
[23] M. Wiederstein and M.J. Sippl, "Protein Sequence Randomization: Efficient Estimation of Protein Stability Using Knowledge-Based Potentials," J. Molecular Biology, vol. 345, no. 5, pp. 1199-1212, Feb. 2005.
[24] T.L. Chiu and R.A. Goldstein, "How to Generate Improved Potentials for Protein Tertiary Structure Prediction: A Lattice Model Study," Proteins, vol. 41, no. 2, pp. 157-163, Nov. 2000.
[25] W.Y. Yang et al., "Heterogeneous Folding of the Trpzip Hairpin: Full Atom Simulation and Experiment," J. Molecular Biology, vol. 336, no. 1, pp. 241-251, Feb. 2004.
[26] O. Sander, I. Sommer, and T. Lengauer, "Local Protein Structure Prediction Using Discriminative Models," BMC Bioinformatics, vol. 7, article no. 14, Jan. 2006.
[27] C.A. Floudas et al., "Advances in Protein Structure Prediction and de Novo Protein Design: A Review," Chemical Eng. Science, vol. 61, no. 3, pp. 966-988, 2006.
[28] C. Zhang et al., "An Accurate, Residue-Level, Pair Potential of Mean Force for Folding and Binding Based on the Distance-Scaled, Ideal-Gas Reference State," Protein Science, vol. 13, no. 2, pp. 400-411, Feb. 2004.
[29] M. Lu, A.D. Dousis, and J. Ma, "OPUS-PSP: An Orientation-Dependent Statistical All-Atom Potential Derived from Side-Chain Packing," J. Molecular Biology, vol. 376, no. 1, pp. 288-301, Feb. 2008.
[30] R. Rajgaria, S.R. McAllister, and C.A. Floudas, "Distance Dependent Centroid to Centroid Force Fields Using High Resolution Decoys," Proteins, vol. 70, no. 3, pp. 950-970, Feb. 2008.
[31] Y. Feng, A. Kloczkowski, and R.L. Jernigan, "Four-Body Contact Potentials Derived from Two Protein Data Sets to Discriminate Native Structures from Decoys," Proteins, vol. 68, no. 1, pp. 57-66, July 2007.
[32] M.J. Sippl, "Boltzmann's Principle, Knowledge-Based Mean Fields and Protein Folding. An Approach to the Computational Determination of Protein Structures," J. Computer-Aided Molecular Design, vol. 7, no. 4, pp. 473-501, Aug. 1993.
[33] F. Melo and E. Feytmans, "Assessing Protein Structures with a Non-Local Atomic Interaction Energy," J. Molecular Biology, vol. 277, no. 5, pp. 1141-1152, 1998.
[34] D. Gilis and M. Rooman, "Identification and Ab Initio Simulations of Early Folding Units in Proteins," Proteins, vol. 42, no. 2, pp. 164-176, Feb. 2001.
[35] M.J. Sippl, "Calculation of Conformational Ensembles from Potentials of Mean Force: An Approach to the Knowledge-Based Prediction of Local Structures in Globular Proteins," J. Molecular Biology, vol. 213, no. 4, pp. 859-883, June 1990.
[36] F. Melo and E. Feytmans, "Novel Knowledge-Based Mean Force Potential at Atomic Level," J. Molecular Biology, vol. 267, no. 1, pp. 207-222, Mar. 1997.
[37] C.M. Summa, M. Levitt, and W.F. Degrado, "An Atomic Environment Potential for Use in Protein Structure Prediction," J. Molecular Biology, vol. 352, no. 4, pp. 986-1001, Sept. 2005.
[38] J. Qiu and R. Elber, "Atomically Detailed Potentials to Recognize Native and Approximate Protein Structures," Proteins, vol. 61, no. 1, pp. 44-55, Oct. 2005.
[39] Q.W. Dong, X.L. Wang, and L. Lin, "Novel Knowledge-Based Mean Force Potential at the Profile Level," BMC Bioinformatics, vol. 7, article no. 324, 2006.
[40] R. Samudrala and J. Moult, "An All-Atom Distance-Dependent Conditional Probability Discriminatory Function for Protein Structure Prediction," J. Molecular Biology, vol. 275, no. 5, pp. 895-916, Feb. 1998.
[41] H. Lu and J. Skolnick, "A Distance-Dependent Atomic Knowledge-Based Potential for Improved Protein Structure Selection," Proteins, vol. 44, no. 3, pp. 223-232, Aug. 2001.
[42] H. Zhou and Y. Zhou, "Distance-Scaled, Finite Ideal-Gas Reference State Improves Structure-Derived Potentials of Mean Force for Structure Selection and Stability Prediction," Protein Science, vol. 11, no. 11, pp. 2714-2726, Nov. 2002.
[43] C. Hu, X. Li, and J. Liang, "Developing Optimal Non-Linear Scoring Function for Protein Design," Bioinformatics, vol. 20, no. 17, pp. 3080-3098, Nov. 2004.
[44] M. Vendruscolo, R. Najmanovich, and E. Domany, "Can a Pairwise Contact Potential Stabilize Native Protein Folds against Decoys Obtained by Threading?" Proteins, vol. 38, no. 2, pp. 134-148, Feb. 2000.
[45] K. Wang et al., "Improved Protein Structure Selection Using Decoy-Dependent Discriminatory Functions," BMC Structural Biology, vol. 4, p. 8, June 2004.
[46] D. Tobi and R. Elber, "Distance-Dependent, Pair Potential for Protein Folding: Results from Linear Optimization," Proteins, vol. 41, no. 1, pp. 40-46, 2000.
[47] J. Zhang, R. Chen, and J. Liang, "Empirical Potential Function for Simplified Protein Models: Combining Contact and Local Sequence-Structure Descriptors," Proteins, vol. 63, no. 4, pp. 949-960, June 2006.
[48] C.W. Tan and D.T. Jones, "Using Neural Networks and Evolutionary Information in Decoy Discrimination for Protein Tertiary Structure Prediction," BMC Bioinformatics, vol. 9 article no. 94, 2008.
[49] D. Eramian et al., "A Composite Score for Predicting Errors in Protein Structure Models," Protein Science, vol. 15, no. 7, pp. 1653-1666, July 2006.
[50] V.N. Vapnik, Statistical Learning Theory. Wiley, 1998.
[51] J. Moult et al., "Critical Assessment of Methods of Protein Structure Prediction—Round VII," Proteins, vol. 69, Suppl. 8, pp. 3-9, 2007.
[52] R. Samudrala and M. Levitt, "Decoys 'R' Us: A Database of Incorrect Conformations to Improve Protein Structure Prediction," Protein Science, vol. 9, no. 7, pp. 1399-1401, July 2000.
[53] D. Tobi et al., "On the Design and Analysis of Protein Folding Potentials," Proteins, vol. 40, no. 1, pp. 71-85, July 2000.
[54] C.C. Chang and C.J. Lin, "LIBSVM: A Library for Support Vector Machines," http://www.csie.ntu.edu.tw/cjlinlibsvm, 2001.
[55] Q. Fang and D. Shortle, "Protein Refolding in Silico with Atom-Based Statistical Potentials and Conformational Search Using a Simple Genetic Algorithm," J. Molecular Biology, vol. 359, no. 5, pp. 1456-1467, June 2006.
[56] D.M. Bolser et al., "Residue Contact-Count Potentials Are as Effective as Residue-Residue Contact-Type Potentials for Ranking Protein Decoys," BMC Structural Biology, vol. 8, p. 53, Dec. 2008.
[57] Y. Wu et al., "OPUS-Ca: A Knowledge-Based Potential Function Requiring Only Calpha Positions," Protein Science, vol. 16, no. 7, pp. 1449-1463, July 2007.
[58] C. Deutsch and B. Krishnamoorthy, "Four-Body Scoring Function for Mutagenesis," Bioinformatics, vol. 23, no. 22, pp. 3009-3015, Nov. 2007.
[59] P. Guntert, C. Mumenthaler, and K. Wuthrich, "Torsion Angle Dynamics for NMR Structure Calculation with the New Program DYANA," J. Molecular Biology, vol. 273, no. 1, pp. 283-298, Oct. 1997.
[60] P. Benkert, S.C. Tosatto, and D. Schomburg, "QMEAN: A Comprehensive Scoring Function for Model Quality Assessment," Proteins, vol. 71, no. 1, pp. 261-277, Apr. 2008.
[61] B. Park and M. Levitt, "Energy Functions That Discriminate X-Ray and Near Native Folds from Well-Constructed Decoys," J. Molecular Biology, vol. 258, no. 2, pp. 367-392, May 1996.
[62] C. Keasar and M. Levitt, "A Novel Approach to Decoy Set Generation: Designing a Physical Energy Function Having Local Minima with Native Structure Characteristics," J. Molecular Biology, vol. 329, no. 1, pp. 159-174, May 2003.
[63] R. Samudrala et al., "A Combined Approach for Ab Initio Construction of Low Resolution Protein Tertiary Structures from Sequence," Proc. Pacific Symp. Biocomputing, pp. 505-516, 1999.
[64] Y. Xia et al., "Ab Initio Construction of Protein Tertiary Structures Using a Hierarchical Approach," J. Molecular Biology, vol. 300, no. 1, pp. 171-185, June 2000.
[65] R. Samudrala and M. Levitt, "A Comprehensive Analysis of 40 Blind Protein Structure Predictions," BMC Structural Biology, vol. 2, p. 3, Aug. 2002.
20 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool