The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - November/December (2011 vol.8)
pp: 1708-1715
Qingguo Wang , University of Missouri, Columbia
Yi Shang , University of Missouri, Columbia
Dong Xu , University of Missouri, Columbia
ABSTRACT
In protein tertiary structure prediction, a crucial step is to select near-native structures from a large number of predicted structural models. Over the years, extensive research has been conducted for the protein structure selection problem with most approaches focusing on developing more accurate energy or scoring functions. Despite significant advances in this area, the discerning power of current approaches is still unsatisfactory. In this paper, we propose a novel consensus-based algorithm for the selection of predicted protein structures. Given a set of predicted models, our method first removes redundant structures to derive a subset of reference models. Then, a structure is ranked based on its average pairwise similarity to the reference models. Using the CASP8 data set containing a large collection of predicted models for 122 targets, we compared our method with the best CASP8 quality assessment (QA) servers, which are all consensus based, and showed that our QA scores correlate better with the GDT-TSs than those of the CASP8 QA servers. We also compared our method with the state-of-the-art scoring functions and showed its improved performance for near-native model selection. The GDT-TSs of the top models picked by our method are on average more than 8 percent better than the ones selected by the best performing scoring function.
INDEX TERMS
Protein tertiary structure, protein structure selection, quality assessment, consensus approach, metapredictor, critical assessment of protein structure prediction.
CITATION
Qingguo Wang, Yi Shang, Dong Xu, "Improving a Consensus Approach for Protein Structure Selection by Removing Redundancy", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 6, pp. 1708-1715, November/December 2011, doi:10.1109/TCBB.2011.75
REFERENCES
[1] J.G. Archie, M. Paluszewski, and K. Karplus, “Applying Undertaker to Quality Assessment,” Proteins: Structure, Function, and Bioinformatics, vol. 77, pp. 191-195, 2009.
[2] D. Baker and A. Sali, “Protein Structure Prediction and Structural Genomics,” Science, vol. 294, pp. 93-96, 2001.
[3] M. Ben-David, O. Noivirt-Brik, A. Paz, J. Prilusky, J.L. Sussman, and Y. Levy, “Assessment of CASP8 Structure Predictions for Template Free Targets,” Proteins: Structure, Function, and Bioinformatics, vol. 77, no. suppl 9, pp. 50-65, 2009.
[4] P. Benkert, S.C.E. Tosatto, and D. Schomburg, “QMEAN: A Comprehensive Scoring Function for Model Quality Assessment,” Proteins: Structure, Function, and Bioinformatics, vol. 71, pp. 261-277, 2008.
[5] P. Benkert, S.C.E. Tosatto, and T. Schwede, “Global and Local Model Quality Estimation at CASP8 Using the Scoring Functions QMEAN and QMEANclust,” Proteins: Structure, Function, and Bioinformatics, vol. 77, pp. 173-180, 2009.
[6] M.R. Betancourt and J. Skolnick, “Finding the Needle in a Haystack: Educing Protein Native Folds from Ambiguous ab initio Folding Predictions,” J. Computational Chemistry, vol. 22, pp. 339-353, 2001.
[7] R. Bondugula and D. Xu, “MUPRED: A Tool for Bridging the Gap between Template Based Methods and Sequence Profile Based Methods for Protein Secondary Structure Prediction,” Proteins: Structure, Function, and Bioinformatics, vol. 66, pp. 664-670, 2007.
[8] R. Bondugula, D. Xu, and Y. Shang, “A Fast Algorithm for Low-Resolution Protein Structure Prediction,” Proc. Ann. Int'l Conf. IEEE Eng. in Medicine and Biology Soc., pp. 5826-5829, July 2006.
[9] I. Borg and P. Groenen, Modern Multidimensional Scaling, Theory and Applications. Springer-Verlag, 1997.
[10] B.R. Brooks, R.E. Bruccoleri, B.D. Olafson, D.J. States, S. Swaminathan, and M. Karplus, “CHARMM: A Program for Macromolecular Energy Minimization and Dynamic Calculations,” J. Computational Chemistry, vol. 4, pp. 187-217, 1983.
[11] J. Cheng, Z. Wang, A.N. Tegge, and J. Eickholt, “Prediction of Global and Local Quality of CASP8 Models by MULTICOM Series,” Proteins: Structure, Function, and Bioinformatics, vol. 77, pp. 181-184, 2009.
[12] D. Cozzetto, A. Kryshtafovych, M. Ceriani, and A. Tramontano, “Assessment of Predictions in the Model Quality Assessment Category,” Proteins: Structure, Function, and Bioinformatics, vol. 69, pp. 175-183, 2007.
[13] D. Cozzetto, A. Kryshtafovych, and A. Tramontano, “Critical Assessment of Methods of Protein Structure Prediction-Round VIII,” Proteins: Structure, Function, and Bioinformatics, vol. 77, pp. 1-4, 2009.
[14] D. Cozzetto, A. Kryshtafovych, and A. Tramontano, “Evaluation of CASP8 Model Quality Predictions,” Proteins: Structure, Function, and Bioinformatics, vol. 77, pp. 157-66, 2009.
[15] K. Ginalski, A. Elofsson, D. Fischer, and L. Rychlewski, “3D-Jury: A Simple Approach to Improve Protein Structure Predictions,” Bioinformatics, vol. 19, pp. 1015-1018, 2003.
[16] C.H. Goulden, Methods of Statistical Analysis, second ed., pp. 50-55. Wiley, 1956.
[17] D. Gront, U.H.E. Hansmann, and A. Kolinski, “Exploring Protein Energy Landscapes with Hierarchical Clustering,” Int'l J. Quantum Chemistry, vol. 105, pp. 826-830, 2005.
[18] M. Kalman and N. Ben-Tal, “Quality Assessment of Protein Model-Structures Using Evolutionary Conservation,” Bioinformatics, vol. 26, no. 10, pp. 1299-1307, 2010.
[19] D.E. Kim, D. Chivian, and D. Baker, “Protein Structure Prediction and Analysis Using the Robetta Server,” Nucleic Acids Research, vol. 32, pp. 526-531, 2004.
[20] D. Kozakov, K.H. Clodfelter, S. Vajda, and C.J. Camacho, “Optimal Clustering for Detecting near-Native Conformations in Protein Docking,” Biophysical J., vol. 89, pp. 867-875, 2005.
[21] T. Lazaridis and M. Karplus, “New View of Protein Folding Reconciled with the Old through Multiple Unfolding Simulations,” Science, vol. 278, pp. 1928-1931, 1997.
[22] P. Larsson, M.J. Skwark, B. Wallner, and A. Elofsson, “Assessment of Global and Local Model Quality in CASP8 Using Pcons and ProQ,” Proteins: Structure, Function, and Bioinformatics, vol. 77, pp. 167-172, 2009.
[23] M. Lu, A.D. Dousis, and J. Ma, “OPUS-PSP: An Orientation-Dependent Statistical All-Atom Potential Derived from Side-Chain Packing,” J. Molecular Biology, vol. 376, pp. 288-301, 2008.
[24] H. Lu and J. Skolnick, “A Distance-Dependent Atomic Knowledge-Based Potential for Improved Protein Structure Selection,” Proteins: Structure, Function, and Genetics, vol. 44, pp. 223-232, 2001.
[25] L.J. Mcguffin, “Prediction of Global and Local Model Quality in CASP8 Using the ModFOLD Server,” Proteins: Structure, Function, and Bioinformatics, vol. 77, no. suppl 9, pp. 185-190, 2009.
[26] L.J. McGuffin, “The ModFOLD Server for the Quality Assessment of Protein Structural Models,” Bioinformatics, vol. 24, no. 4, pp. 586-587, 2008.
[27] L.J. McGuffin and D.B. Roche, “Rapid Model Quality Assessment for Protein Structure Predictions Using the Comparison of Multiple Models without Structural Alignments,” Bioinformatics, vol. 26, no. 2, pp. 182-188, 2010.
[28] J. Moult, “Comparison of Database Potentials and Molecular Mechanics Force Fields,” Current Opinion in Structural Biology, vol. 7, pp. 194-199, 1997.
[29] I. Nobeli, J.B.O. Mitchell, A. Alex, and J. Thornton, “Evaluation of a Knowledge-Based Potential of Mean Force for Scoring Docked Protein-Ligand Complexes,” J. Computational Chemistry, vol. 22, pp. 673-688, 2001.
[30] J. Qui, W. Sheffler, D. Baker, and W.S. Noble, “Ranking Predicted Protein Structures with Support Vector Regression,” Proteins: Structure, Function, and Bioinformatics, vol. 71, pp. 1175-1182, May 2008.
[31] R. Samudrala and J. Moult, “An All-Atom Distance-Dependent Conditional Probability Discriminatory Function for Protein Structure Prediction,” J. Molecular Biology, vol. 275, pp. 895-916, 1998.
[32] M. Shen and A. Sali, “Statistical Potential for Assessment and Prediction of Protein Structures,” Protein Science, vol. 15, pp. 2507-2524, 2006.
[33] D. Shortle, K.T. Simons, and D. Baker, “Clustering of Low-Energy Conformations near the Native Structures of Small Proteins,” Biophysics, vol. 95, pp. 11158-11162, 1998.
[34] M. Sippl, “Knowledge-Based Potentials for Proteins,” Current Opinion in Structural Biology, vol. 5, pp. 229-235, 1995.
[35] K. Wang, B. Fain, M. Levitt, and R. Samudrala, “Improved Protein Structure Selection Using Decoy-Dependent Discriminatory Functions,” BMC Structural Biology, vol. 4, p. 8, 2004.
[36] Q. Wang, Y. Shang, and D. Xu, “A New Clustering-Based Method for Protein Structure Selection,” Proc. Int'l Joint Conf. Neural Networks (IJCNN '08), pp. 2891-2898, 2008.
[37] Z. Wang, A.N. Tegge, and J. Cheng, “Evaluating the Absolute Quality of a Single Protein Model Using Support Vector Machines and Structural Features,” Proteins, vol. 75, no. 3, pp. 638-647, 2009.
[38] B. Wallner and A. Elofsson, “Pcons5: Combining Consensus, Structural Evaluation and Fold Recognition Scores,” Bioinformatics, vol. 21, pp. 4248-4254, 2005.
[39] E.W. Weisstein, “Paired t-Test,” From MathWorld-A Wolfram Web Resource, http://mathworld.wolfram.comPairedt-Test.html , 2011.
[40] Y. Wu, M. Lu, M. Chen, J. Li, and J. Ma, “OPUS-Ca: A Knowledge-Based Potential Function Requiring Only Ca Positions,” Protein Science, vol. 16, pp. 1449-1463, 2007.
[41] C. Venclovas and M. Margelevičius, “Comparative Modeling in CASP6 Using Consensus Approach to Template Selection, Sequence-Structure Alignment, and Structure Assessment,” Proteins: Structure, Function, and Bioinformatics, vol. 7, pp. 99-105, 2005.
[42] Y. Zhang and J. Skolnick, “SPICKER: A Clustering Approach to Identify near-Native Protein Folds,” J. Computational Chemistry, vol. 25, pp. 865-871, 2004.
[43] Y. Zhang and J. Skolnick, “Scoring Function for Automated Assessment of Protein Structure Template Quality,” Proteins, vol. 57, pp. 702-710, June 2004.
[44] Y. Yang and Y. Zhou, “Ab initio Folding of Terminal Segments with Secondary Structures Reveals the Fine Difference between Two Closely Related All-Atom Statistical Energy Functions,” Protein Science, vol. 17, pp. 1212-1219, 2008.
[45] J. Zhang, Q. Wang, B. Barz, Z. He, I. Kosztin, Y. Shang, and D. Xu, “MUFOLD: A New Solution for Protein 3D Structure Prediction,” Proteins: Structure, Function, and Bioinformatics, vol. 78, pp. 1137-1152, 2009.
[46] A. Zemla, “LGA: A Method for Finding 3D Similarities in Protein Structures,” Nucleic Acids Research, vol. 31, no. 13, pp. 3370-3374, 2003.
[47] H. Zhou and Y. Zhou, “Distance-Scaled, Finite Ideal-Gas Reference State Improves Structure-Derived Potentials of Mean Force for Structure Selection and Stability Prediction,” Protein Science, vol. 11, pp. 2714-2726, 2002.
[48] http://prodata.swmed.edu/CASP8/evaluation Categories.htm, 2011.
[49] http://www.graphpad.com/quickcalcsttest1.cfm?Format=C , 2011.
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool