This Article 
 Bibliographic References 
 Add to: 
Fold Recognition by Predicted Alignment Accuracy
April-June 2005 (vol. 2 no. 2)
pp. 157-165

Abstract—One of the key components in protein structure prediction by protein threading technique is to choose the best overall template for a given target sequence after all the optimal sequence-template alignments are generated. The chosen template should have the best alignment with the target sequence since the three-dimensional structure of the target sequence is built on the sequence-template alignment. The traditional method for template selection is called Z-score, which uses a statistical test to rank all the sequence-template alignments and then chooses the first-ranked template for the sequence. However, the calculation of Z-score is time-consuming and not suitable for genome-scale structure prediction. Z-scores are also hard to interpret when the threading scoring function is the weighted sum of several energy items of different physical meanings. This paper presents a Support Vector Machine (SVM) regression approach to directly predict the alignment accuracy of a sequence-template alignment, which is used to rank all the templates for a specific target sequence. Experimental results on a large-scale benchmark demonstrate that SVM regression performs much better than the composition-corrected Z-score method. SVM regression also runs much faster than the Z-score method.

[1] J. Moult, T. Hubbard, F. Fidelis, and J. Pedersen, “Critical Assessment of Methods on Protein Structure Prediction (CASP)-Round III,” Proteins: Structure, Function and Genetics, vol. 37, pp. 2-6, Dec. 1999.
[2] J. Moult, F. Fidelis, A. Zemla, and T. Hubbard, “Critical Assessment of Methods on Protein Structure Prediction (CASP)-Round IV,” Proteins: Structure, Function and Genetics, vol. 45, pp. 2-7, Dec. 2001.
[3] J. Moult, F. Fidelis, A. Zemla, and T. Hubbard, “Critical Assessment of Methods on Protein Structure Prediction (CASP)-Round V,” Proteins: Structure, Function and Genetics, vol. 53, pp. 334-339, Oct. 2003.
[4] A. Sali and T.L. Blundell, “Comparative Protein Modelling by Satisfaction of Spatial Restraints,” J. Molecular Biology, vol. 234, pp. 779-815, 1993.
[5] Y. Xu, D. Xu, and E.C. Uberbacher, “An Efficient Computational Method for Globally Optimal Threadings,” J. Computational Biology, vol. 5, no. 3, pp. 597-614, 1998.
[6] D. Kim, D. Xu, J. Guo, K. Ellrott, and Y. Xu, “PROSPECT II: Protein Structure Prediction Method for Genome-Scale Applications,” Protein Engineering, vol. 16, no. 9, pp. 641-650, 2003.
[7] L.A. Kelley, R.M. MacCallum, and M.J. E. Sternberg, “Enhanced Genome Annotation Using Structural Profiles in the Program 3D-PSSM,” J. Molecular Biology, vol. 299, pp. 499-520, 2000.
[8] J. Shi, L.B. Tom, and M. Kenji, “FUGUE: Sequence-Structure Homology Recognition Using Environment-Specific Substitution Tables and Structure-Dependent Gap Penalties,” J. Molecular Biology, vol. 310, pp. 243-257, 2001.
[9] D.T. Jones, “GenTHREADER: An Efficient and Reliable Protein Fold Recognition Method for Genomic Sequences,” J. Molecular Biology, vol. 287, pp. 797-815, 1999.
[10] J. Xu, M. Li, D. Kim, and Y. Xu, “RAPTOR: Optimal Protein Threading by Linear Programming,” J. Bioinformatics and Computational Biology, vol. 1, no. 1, pp. 95-117, 2003.
[11] T. Akutsu and S. Miyano, “On the Approximation of Protein Threading,” Theoretical Computer Science, vol. 210, pp. 261-275, 1999.
[12] D.T. Jones, W.R. Taylor, and J.M. Thornton, “A New Approach to Protein Fold Recognition,” Nature, vol. 358, pp. 86-98, 1992.
[13] S.H. Bryant and S.F. Altschul, “Statistics of Sequence-Structure Threading,” Current Opinions in Structural Biology, vol. 5, pp. 236-244, 1995.
[14] Y. Xu, D. Xu, and V. Olman, “A Practical Method for Interpretation of Threading Scores: An Application of Neural Networks,” Statistica Sinica, special issue on bioinformatics, vol. 12, pp. 159-177, 2002.
[15] A.G. Murzin, S.E. Brenner, T. Hubbard, and C. Chothia, “SCOP: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures,” J. Molecular Biology, vol. 247, pp. 536-540, 1995.
[16] C.A. Orengo, A.D. Michie, S. Jones, D.T. Jones, M.B. Swindells, and J.M. Thornton, “CATH-A Hierarchic Classification of Protein Domain Structures,” Structure, vol. 5, pp. 1093-1108, 1997.
[17] F.M. G. Pearl, D. Lee, J.E. Bray, I. Sillitoe, A.E. Todd, A.P. Harrison, J.M. Thornton, and C.A. Orengo, “Assigning Genomic Sequences to CATH,” Nucleic Acids Research, vol. 28, pp. 277-282, 2000.
[18] C.H.Q. Ding and I. Dubchak, “Multi-Class Protein Fold Recognition Using Support Vector Machine and Neural Networks,” Bioinformatics, vol. 17, no. 4, pp. 349-358, 2001.
[19] V.N. Vapnik, The Nature of Statistical Learning Theory. Springer, 1995.
[20] A.J. Smola and B. Schölkopf, “A Tutorial on Support Vector Regression,” technical report, Oct. 1998.
[21] D.T. Jones, “Protein Secondary Structure Prediction Based on Position-Specific Scoring Matrices,” J. Molecular Biology, vol. 292, pp. 195-202, 1999.
[22] N.N. Alexandrov, “SARFing the PDB,” Protein Eng., vol. 9, pp. 727-732, 1996.
[23] L. Holm and C. Sander, “Decision Support System for the Evolutionary Classification of Protein Structures,” Proc. Conf. Intelligent Systems for Molecular Biology (ISMB), vol. 5, pp. 140-146, 1997.
[24] D. Fischer, A. Elofsson, J.U. Bowie, and D. Eisenberg, “Assessing the Performance of Fold Recognition Methods by Means of a Comprehensive Benchmark,” Biocomputing: Proc. 1996 Pacific Symp., pp. 300-318, 1996.
[25] E. Lindahl and A. Elofsson, “Identification of Related Proteins on Family, Superfamily and Fold Level,” J. Molecular Biology, vol. 295, pp. 613-625, 2000.
[26] D. Fischer, L. Rychlewski, R.L. Dunbrack, A.R. Ortiz, and A. Elofsson, “CAFASP3: The Third Critical Assessment of Fully Automated Structure Prediction Methods,” Proteins: Structure, Function and Genetics, vol. S6, no. 53, pp. 503-516, Oct. 2003.
[27] N. Siew, A. Elofsson, L. Rychlewski, and D. Fischer, “Maxsub: An Automated Measure for the Assessment of Protein Structure Prediction Quality,” Bioinformatics, vol. 16, no. 9, pp. 776-785, 2000.

Index Terms:
Protein structure prediction, protein threading, protein fold recognition, SVM regression.
Jinbo Xu, "Fold Recognition by Predicted Alignment Accuracy," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 2, no. 2, pp. 157-165, April-June 2005, doi:10.1109/TCBB.2005.24
Usage of this product signifies your acceptance of the Terms of Use.