This Article 
 Bibliographic References 
 Add to: 
Exploiting Intrastructure Information for Secondary Structure Prediction with Multifaceted Pipelines
May-June 2012 (vol. 9 no. 3)
pp. 799-808
G. Armano, Dept. of Electr. & Electron. Eng., Univ. of Cagliari, Cagliari, Italy
F. Ledda, Dept. of Electr. & Electron. Eng., Univ. of Cagliari, Cagliari, Italy
Predicting the secondary structure of proteins is still a typical step in several bioinformatic tasks, in particular, for tertiary structure prediction. Notwithstanding the impressive results obtained so far, mostly due to the advent of sequence encoding schemes based on multiple alignment, in our view the problem should be studied from a novel perspective, in which understanding how available information sources are dealt with plays a central role. After revisiting a well-known secondary structure predictor viewed from this perspective (with the goal of identifying which sources of information have been considered and which have not), we propose a generic software architecture designed to account for all relevant information sources. To demonstrate the validity of the approach, a predictor compliant with the proposed generic architecture has been implemented and compared with several state-of-the-art secondary structure predictors. Experiments have been carried out on standard data sets, and the corresponding results confirm the validity of the approach. The predictor is available at through the corresponding web application or as downloadable stand-alone portable unpack-and-run bundle.

[1] S.F. Altschul, T.L. Madden, A.A. Schaeffer, J. Zhang, Z. Zhang, W. Miller, and D.J. Lipman, "Gapped Blast and Psi-Blast: A New Generation of Protein Database Search Programs," Nucleic Acids Research, vol. 25, pp. 3389-3402, 1997.
[2] C.B. Anfinsen, "Principles That Govern the Folding of Protein Chains," Science, vol. 181, pp. 223-230, 1973.
[3] G. Armano, F. Ledda, and E. Vargiu, "Sum-Linear Blosum: A Novel Protein Encoding Method for Secondary Structure Prediction," Comm. SIWN, vol. 6, pp. 71-77, 2009.
[4] G. Armano, G. Mancosu, L. Milanesi, A. Orro, M. Saba, and E. Vargiu, "A Hybrid Genetic-Neural System for Predicting Protein Secondary Structure," BMC Bioinformatics, vol. 6, no. Suppl. 4, article s3, 2005.
[5] P. Baldi, S. Brunak, P. Frasconi, G. Soda, and G. Pollastri, "Exploiting the Past and the Future in Protein Secondary Structure Prediction," Bioinformatics, vol. 15, pp. 937-946, 1999.
[6] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, and P.E. Bourne, "The Protein Data Bank," Nucleic Acids Research, vol. 28, pp. 235-242, 2000.
[7] C. Bystroff and Y. Shao, "Fully Automated ab Initio Protein Structure Prediction Using I-SITES, HMMSTR and ROSETTA," Bioinformatics, vol. 18, no. Suppl 1, pp. S54-S61, 2002.
[8] C. Bystroff, V. Thorsson, and D. Baker, "HMMSTR: A Hidden Markov Model for Local Sequence-Structure Correlations in Proteins," J. Molecular Biology, vol. 301, pp. 173-190, 2000.
[9] P.Y. Chou and U.D. Fasman, "Prediction of Protein Conformation," Biochemistry, vol. 13, pp. 211-215, 1974.
[10] G.E. Crooks and S.E. Brenner, "Protein Secondary Structure: Entropy, Correlations and Prediction," Bioinformatics, vol. 20, pp. 1603-1611, 2004.
[11] J.A. Cuff and G.J. Barton, "Evaluation and Improvement of Multiple Sequence Methods for Protein Secondary Structure Prediction," Proteins, vol. 34, no. 4, pp. 508-19, 1999.
[12] J.A. Cuff and G.J. Barton, "Application of Multiple Sequence Alignment Profiles to Improve Protein Secondary Structure Prediction," Proteins, vol. 40, no. 3, pp. 502-511, 2000.
[13] A.V. Eyrich, M.A. Mart-Renom, D. Przybylski, M.S. Madhusudhan, A. Fiser, F. Pazos, A. Valencia, A. Sali, and B. Rost, "Eva: Continuous Automatic Evaluation of Protein Structure Prediction Servers," Bioinformatics, vol. 17, no. 12, pp. 1242-1243, 2001.
[14] C.A. Floudas, H.K. Fung, S.R. McAllister, M. Monnigmann, and R. Rajgaria, "Advances in Protein Structure Prediction and De Novo Protein Design: A Review," Chemical Eng. Science, vol. 61, pp. 966-988, 2006.
[15] D. Frishman and P. Argos, "75 percent Accuracy in Protein Secondary Structure Prediction," Proteins, vol. 27, pp. 329-335, 1997.
[16] J. Garnier, D.J. Osguthorpe, and B. Robson, "Analysis of the Accuracy and Implications of Simple Methods for Predicting the Secondary Structure of Globular Proteins," J. Molecular Biology, vol. 120, no. 1, pp. 97-120, 1978.
[17] S. Henikoff and J.G. Henikoff, "Amino Acid Substitution Matrices from Protein Blocks," Proc. Nat'l Academy of Sciences USA, vol. 89, no. 22, pp. 10915-10919, 1992.
[18] J. Heringa, "Computational Methods for Protein Secondary Structure Prediction Using Multiple Sequence Alignments," Current Protein Peptide Science, vol. 1, no. 3, pp. 273-301, Nov. 2000.
[19] H.L. Holley and M. Karplus, "Protein Secondary Structure Prediction with a Neural Network," Proc. Nat'l Academy of Sciences USA, vol. 86, pp. 152-156, 1989.
[20] D.T. Jones, "Protein Secondary Structure Prediction Based on Position-Specific Scoring Matrices," J. Molecular Biology, vol. 292, pp. 192-202, 1999.
[21] D.T. Jones, W.R. Taylor, and J.M. Thornton, "A Model Recognition Approach to the Prediction of All-helical Membrane Protein Structure and Topology," Biochemistry, vol. 33, pp. 3038-3049, 1994.
[22] M. Kanehisa, "A Multivariate Analysis Method for Discriminating Protein Secondary Structural Segments," Protein Eng., vol. 2, pp. 87-92, 1988.
[23] K. Karplus, C. Barrett, M. Cline, M. Diekhans, L. Grate, and R. Hughey, "Predicting Protein Structure using only Sequence Information," Proteins: Structure, Function, and Bioinformatics, vol. 37, no. S3, pp. 121-125, 1999.
[24] J.L. Klepeis, C.A. Floudas, D. Morikis, C.G. Tsokos, E. Argyropoulos, L. Spruce, and J.D. Lambris, "Integrated Computational and Experimental Approach for Lead Optimization and Design of Compstatin Variants with Improved Activity," J. Am. Chemical Soc., vol. 125, pp. 8422-8423, 2003.
[25] F. Ledda, L. Milanesi, and E. Vargiu, "Game: A Generic Architecture Based on Multiple Experts for Predicting Protein Structures," Comm. SIWN, vol. 3, pp. 107-112, 2008.
[26] F. Ledda and E. Vargiu, "Experimenting Heterogeneous output Combination to Improve Secondary Structure Predictions," Proc. Workshop Data Mining and Bioinformatics, 2008.
[27] J.M. Levin, S. Pascarella, P. Argos, and J. Garnier, "Quantification of Secondary Structure Prediction Improvement Using Multiple Alignment," Protein Eng., vol. 6, pp. 849-854, 1993.
[28] H. Lin, J. Chang, K. Wu, T. Sung, and W. Hsu, "HYPROSP II-A Knowledge-Based Hybrid Method for Protein Secondary Structure Prediction Based on Local Prediction Confidence," Bioinformatics, vol. 21, no. 15, pp. 3227-3233, 2005.
[29] B.W. Matthews, "Comparison of the Predicted and Observed Secondary Structure of T4 Phage Lysozyme," Biochimica et Biophysica Acta, vol. 405, pp. 442-451, 1975.
[30] L.J. McGuffin and D.T. Jones, "Improvement of the GenTHREADER Method for Genomic Fold Recognition," Bioinformatics, vol. 19, no. 7, pp. 874-881, 2003.
[31] K.B. Murphy, Computer Science Vol. PHD, vol. 225, chapter Dynamic Bayesian Networks: Representation, Inference and Learning, U.C. Berkeley, 2002.
[32] M. Ouali and R.D. King, "Cascaded Multiple Classifiers for Secondary Structure Prediction," Protein Science, vol. 9, pp. 1162-1176, 1999.
[33] T.N. Petersen, C. Lundegaard, M. Nielsen, H. Bohr, J. Bohr, S. Brunak, G.P. Gippert, and O. Lund, "Prediction of Protein Secondary Structure at 80 Percent Accuracy," Proteins, vol. 41, pp. 17-20, 2000.
[34] G. Pollastri and A. McLysaght, "Porter: A New, Accurate Server for Protein Secondary Structure Prediction," Bioinformatics, vol. 21, no. 8, pp. 1719-20, 2005.
[35] D. Przybylski and B. Rost, "Alignments Grow, Secondary Structure Prediction Improves," Proteins, vol. 46, pp. 197-205, 2002.
[36] D. Przybylski and B. Rost, "Improving Fold Recognition without Folds," J. Molecular Biology, vol. 341, pp. 255-269, 2004.
[37] N. Qian and T.J. Sejnowski, "Predicting the Secondary Structure of Globular Proteins Using Neural Network Models," J. Molecular Biology, vol. 202, pp. 865-884, 1988.
[38] S.K. Riis and A. Krogh, "Improving Prediction of Protein Secondary Structure Using Structured Neural Networks and Multiple Sequence Alignments," J. Computational Biology, vol. 3, pp. 163-183, 1996.
[39] B. Robson, "Conformational Properties of Amino Acid Residues in Globular Proteins," J. Molecular Biology, vol. 107, pp. 327-356, 1976.
[40] B. Rost, "Phd: Predicting One-Dimensional Protein Structure by Profile Based Neural Networks," Methods Enzymology, vol. 266, pp. 525-539, 1996.
[41] B. Rost, "Review: Protein Secondary Structure Prediction Continues to Rise," J. Structural Biology, vol. 134, pp. 204-218, 2001.
[42] B. Rost and C. Sander, "Improved Prediction of Protein Secondary Structure by Use of Sequence Profiles and Neural Networks," Proc. Nat'l Academy of Sciences USA, vol. 90, no. 16, pp. 7558-7562, 1993.
[43] B. Rost, G. Yachdav, and J. Liu, "The PredictProtein server," Nucleic Acids Research, vol. 32, pp. W321-W326, 2004.
[44] C. Sander and R. Schneider, "Database of Homology-derived Protein Structures and the Structural Meaning of Sequence Alignment," Proteins, vol. 9, no. 1, pp. 56-68, 1991.
[45] J.J. Ward, L.J. McGuffin, B.F. Buxton, and D.T. Jones, "Secondary Structure Prediction with Support Vector Machines," Bioinformatics, vol. 19, no. 13, pp. 1650-1655, 2003.
[46] D.H. Wolpert, "Stacked Generalization," Neural Networks, vol. 5, pp. 241-259, 1992.
[47] R.-X. Yan, J.-N. Si, C. Wang, and Z. Zhang, "DescFold: A Web Server for Protein Fold Recognition," BMC Bioinformatics, vol. 10, article 416, 2009.
[48] A.S. Yang and L. Yong Wang, "Local Structure Prediction with Local Structure-Based Sequence Profiles," Bioinformatics, vol. 19, no. 10, pp. 1267-1274, 2003.
[49] X.-Q. Yao, H. Zhu, and Z.-S. She, "A Dynamic Bayesian Network Approach to Protein Secondary Structure Prediction," BMC Bioinformatics, vol. 9, article 49, 2008.
[50] T.M. Yi and E.S. Lander, "Protein Secondary Structure Prediction Using Nearest-Neighbor Methods," J. Molecular Biology, vol. 232, pp. 1117-1129, 1993.
[51] A. Zemla, C. Vencolvas, K. Fidelis, and B. Rost, "A Modified Definition of SOV, a Segment-Based Measure for Protein Secondary Structure Prediction Assesment," Proteins, vol. 34, pp. 220-223, 1999.
[52] W. Zhong, G. Altun, R. Harrison, Y. Pan, and X. Tian, "Parallel Protein Secondary Structure Prediction Schemes using Pthread and OpenMP over Hyper-Threading Technology," J. Supercomputing, vol. 41, pp. 1-16, 2007.

Index Terms:
proteins,bioinformatics,information systems,Internet,molecular biophysics,downloadable stand-alone portable unpack-run bundle,exploiting intrastructure information,secondary structure prediction,multifaceted pipelines,proteins,bioinformatic tasks,tertiary structure prediction,sequence encoding schemes,generic software architecture,generic architecture,web application,Encoding,Correlation,Proteins,Pipelines,Amino acids,Computer architecture,Prediction algorithms,artificial neural networks.,Secondary structure prediction,protein encoding schemes,ensemble architectures
G. Armano, F. Ledda, "Exploiting Intrastructure Information for Secondary Structure Prediction with Multifaceted Pipelines," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 3, pp. 799-808, May-June 2012, doi:10.1109/TCBB.2011.159
Usage of this product signifies your acceptance of the Terms of Use.