This Article 
 Bibliographic References 
 Add to: 
Bayesian Segmental Models with Multiple Sequence Alignment Profiles for Protein Secondary Structure and Contact Map Prediction
April-June 2006 (vol. 3 no. 2)
pp. 98-113
In this paper, we develop a segmental semi-Markov model (SSMM) for protein secondary structure prediction which incorporates multiple sequence alignment profiles with the purpose of improving the predictive performance. The segmental model is a generalization of the hidden Markov model where a hidden state generates segments of various length and secondary structure type. A novel parameterized model is proposed for the likelihood function that explicitly represents multiple sequence alignment profiles to capture the segmental conformation. Numerical results on benchmark data sets show that incorporating the profiles results in substantial improvements and the generalization performance is promising. By incorporating the information from long range interactions in \beta{\hbox{-}}{\rm sheets}, this model is also capable of carrying out inference on contact maps. This is an important advantage of probabilistic generative models over the traditional discriminative approach to protein secondary structure prediction. The Web server of our algorithm and supplementary materials are available at

[1] N. Qian and T.J. Sejnowski, “Predicting the Secondary Structure of Globular Proteins Using Neural Network Models,” J. Molecular Biology, vol. 202, pp. 865-884, 1988.
[2] B. Rost and C. Sander, “Prediction of Protein Secondary Structure at Better than 70% Accuracy,” J. Molecular Biology, vol. 232, pp. 584-599, 1993.
[3] S.F. Altschul, T.L. Madden, A.A. Schaeffer, J. Zhang, Z. Zhang, W. Miller, and D.J. Lipman, “Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs,” Nucleic Acids Research, vol. 25, pp. 3389-3402, 1997.
[4] D. Jones, “Protein Secondary Structure Prediction Based on Position-Specific Scoring Matrices,” J. Molecular Biology, vol. 292, pp. 195-202, 1999.
[5] J.A. Cuff and G.J. Barton, “Application of Multiple Sequence Alignment Profiles to Improve Protein Secondary Structure Prediction,” Proteins: Structure, Function and Genetics, vol. 40, pp. 502-511, 2000.
[6] A.L. Delcher, S. Kasif, H.R. Goldberg, and W.H. Hsu, “Protein Secondary Structure Modelling with Probabilistic Networks,” Proc. Int'l Conf. Intelligent Systems and Molecular Biology, pp. 109-117, 1993.
[7] C. Burge and S. Karlin, “Prediction of Complete Gene Structures in Human Genomic DNA,” J. Molecular Biology, vol. 268, no. 1, pp. 78-94, 1997.
[8] R.F. Yel, L.P. Lim, and C.B. Burge, “Computational Inference of Homologous Gene Structures in the Human Genome,” Genome Research, vol. 11, no. 5, pp. 803-816, 2001.
[9] L. Zhang, V. Pavlovic, C.R. Cantor, and S. Kasif, “Human-Mouse Gene Identification by Comparative Evidence Integration and Evolutionary Analysis,” Genome Research, vol. 13, pp. 1190-1202, 2003.
[10] I. Korf, P. Flicek, D. Duan, and M.R. Brent, “Integrating Genomic Homology into Gene Structure Prediction,” Bioinformatics, vol. 17, supplement 1, pp. S140-S148, 2001.
[11] C.S. Schmidler, “Statistical Models and Monte Carlo Methods for Protein Structure Prediction,” PhD thesis, Stanford Univ., May 2002.
[12] M. Ostendorf, V. Digalakis, and O. Kimball, “From HMM to Segment Models: A Unified View of Stochastic Modelling for Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 4, no. 5, pp. 360-378, 1996.
[13] C.S. Schmidler, J.S. Liu, and D.L. Brutlag, “Bayesian Segmentation of Protein Secondary Structure,” J. Computational Biology, vol. 7, nos. 1/2, pp. 233-248, 2000.
[14] C.S. Schmidler, J.S. Liu, and D.L. Brutlag, “Bayesian Protein Structure Prediction, ” Case Studies in Bayesian Statistics, pp. 363-378, Springer, 2002.
[15] G.E. Hinton, “Products of Experts,” Proc. Ninth Int'l Conf. Artificial Neural Networks, pp. 1-6 1999
[16] K. Simons, C. Kooperberg, E. Huang, and D. Baker, “Assembly of Protein Tertiary Structures from Fragments with Similar Local Sequences Using Simulated Anealing and Bayesian Scoring Functions,” J. Molecular Biology, vol. 268, pp. 209-225, 1997.
[17] Y. Ye, L. Jaroszewski, W. Li, and A. Godzik, “A Segment Alignment Approach to Protein Comparison,” Bioinformatics, vol. 19, pp. 742-749, 2003.
[18] J.D. Thompson, D.G. Higgins, and T.J. Gibson, “CLUSTAL W: Improving the Sensistivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice,” Nucleic Acids Research, vol. 22, pp. 4673-4680, 1994.
[19] R. Aurora and G.D. Rose, “Helix Capping,” Protein Science, vol. 7, pp. 21-38, 1998.
[20] D. Eisenberg, R.M. Weiss, and T.C. Terwilliger, “The Hydrophobic Moment Detects Periodicity in Protein Hydrophobicity,” Proc. Nat'l Academy of Sciences, USA, vol. 81, pp. 140-144, 1984.
[21] Z. Aydin, Y. Altunbaşak, and M. Borodovsky, “Protein Secondary Structure Prediction with Semi Markov HMMs,” Proc. IEEE Int'l Conf. Acoustics Speech, and Signal Processing, 2004.
[22] W. Chu, Z. Ghahramani, and D. Wild, “Protein Secondary Structure Prediction Using Sigmoid Belief Networks to Parameterize Segmental Semi-Markov Models,” Proc. European Symp. Artificial Neural Networks, 2004.
[23] N.C. Fitzkee and G.D. Rose, “Steric Restrictions in Protein Folding: An $\alpha$ -Helix Cannot Be Followed by a Contiguous $\beta$ -Strand,” Protein Science, Feb. 2004.
[24] T.M. Klingler and D.L. Brutlag, “ Discovering Structural Correlations in $\alpha$ -Helices,” Protein Science, vol. 3, pp. 1847-1857, 1994.
[25] K. Sjölander, K. Karplus, M. Brown, R. Hughey, A. Krogh, I.S. Mian, and D. Haussler, “Dirichlet Mixtures: A Method for Improving Detection of Weak but Significant Protein Sequence Homology,” Computing Applications in the Biosciences, vol. 12, no. 4, pp. 327-345, 1996.
[26] O. Winther and A. Krogh, “Teaching Computers to Fold Proteins,” Physical Rev. E70, 030903 (R), 2004.
[27] R.L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257-286, 1989.
[28] G. Pollastri and P. Baldi, “Prediction of Contact Maps by GIOHMMs and Recurrent Neural Networks Using Lateral Propagation from All Four Cardinal Corners,” Bioinformatics, vol. 18, supplement 1, pp. S62-S70, 2002.
[29] G.E. Crooks and S.E. Brenner, “Protein Secondary Structure: Entropy, Correlations and Prediction,” Bioinformatics, vol. 20, pp. 1603-1611, 2004.
[30] P. Burman, “A Comparative Study of Ordinary Cross Validation, v-Fold Cross Validation and the Repeated Learning-Testing Methods,” Biometrika, vol. 76, no. 3, pp. 503-514, 1989.
[31] M. Stone, “Cross-Validatory Choice and Assessment of Statistical Predictions (with Discussion),” J. Royal Statistical Soc. B, vol. 36, pp. 111-147, 1974.
[32] J.S. Liu, Monte Carlo Strategies in Scientific Computing. Springer, 2001.
[33] B.W. Matthews, “Comparison of the Predicted and Observed Secondary Structure of t4 Phage Lysozyme,” Biochemical Biophysics, vol. 405, pp. 442-451, 1975.
[34] A. Zemla, C. Venclovas, K. Fidelis, and B. Rost, “A Modified Definition of SOV, a Segment-Based Measure for Protein Secondary Prediction Assessment,” Proteins: Structure, Function, and Genetics, vol. 34, pp. 220-223, 1999.
[35] J.A. Cuff and G.J. Barton, “Evaluation and Improvement of Multiple Sequence Methods for Protein Secondary Structure Prediction,” Proteins: Structure, Function and Genetics, vol. 34, pp. 508-519, 1999.
[36] W. Kabsch and C. Sander, “A Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features,” Biopolymers, vol. 22, pp. 2577-2637, 1983.
[37] B. Rost and V. Eyrich, “Eva: Large-Scale Analysis of Secondary Structure Prediction,” Proteins, vol. 45, supplement 5, pp. 192-199, 2001.
[38] D. Przybylski and B. Rost, “Alignments Grow, Secondary Structure Prediction Improves,” Proteins, vol. 46, pp. 197-205, 2002.
[39] A. Murzin, S. Brenner, T. Hubbard, and C. Chothia, “SCOP: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures,” J. Molecular Biology, vol. 247, pp. 536-540, 1995.
[40] M.S. Cline, K. Karplus, R. Lathrop, T. Smith, R. RogersJr, and D. Haussler, “Information-Theoretic Dissection of Pairwise Contact Potentials,” Proteins: Structure, Function, and Bioinformatics, vol. 49, pp. 7-14, 2002.
[41] G.E. Crooks, J. Wolfe, and S.E. Brenner, “Measurements of Protein Sequence-Structure Correlations,” Proteins: Structure, Function, and Bioinformatics, vol. 57, pp. 804-810, 2004.
[42] A.P. Bradley, “The Use of the Area under the Roc Curve in the Evaluation of Machine Learning Algorithms,” Pattern Recognition, vol. 30, no. 7, pp. 1145-1159, 1997.

Index Terms:
Bayesian segmental semi-Markov models, generative models, protein secondary structure, contact maps, multiple sequence alignment profiles, parametric models.
Wei Chu, Zoubin Ghahramani, Alexei Podtelezhnikov, David L. Wild, "Bayesian Segmental Models with Multiple Sequence Alignment Profiles for Protein Secondary Structure and Contact Map Prediction," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 3, no. 2, pp. 98-113, April-June 2006, doi:10.1109/TCBB.2006.17
Usage of this product signifies your acceptance of the Terms of Use.