The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - January/February (2012 vol.9)
pp: 249-261
Hong Sun , The Ohio State University, Columbus
Ahmet Sacan , Drexel University, Philadelphia
Hakan Ferhatosmanoglu , The Ohio State University, Columbus
Yusu Wang , The Ohio State University, Columbus
ABSTRACT
Availability of an effective tool for protein multiple structural alignment (MSTA) is essential for discovery and analysis of biologically significant structural motifs that can help solve functional annotation and drug design problems. Existing MSTA methods collect residue correspondences mostly through pairwise comparison of consecutive fragments, which can lead to suboptimal alignments, especially when the similarity among the proteins is low. We introduce a novel strategy based on: building a contact-window based motif library from the protein structural data, discovery and extension of common alignment seeds from this library, and optimal superimposition of multiple structures according to these alignment seeds by an enhanced partial order curve comparison method. The ability of our strategy to detect multiple correspondences simultaneously, to catch alignments globally, and to support flexible alignments, endorse a sensitive and robust automated algorithm that can expose similarities among protein structures even under low similarity conditions. Our method yields better alignment results compared to other popular MSTA methods, on several protein structure data sets that span various structural folds and represent different protein similarity levels. A web-based alignment tool, a downloadable executable, and detailed alignment results for the data sets used here are available at http://sacan.biomed. drexel.edu/Smolign and http://bio.cse.ohio-state.edu/Smolign.
INDEX TERMS
Proteins, Protein engineering, Libraries, Measurement uncertainty, Sun, Educational institutions, Computer science,HOMSTRAD., Protein structure, multiple structure alignment, partial order curve comparison, structural motif library, secondary structure elements (SSE), distance map, contact map
CITATION
Hong Sun, Ahmet Sacan, Hakan Ferhatosmanoglu, Yusu Wang, "Smolign: A Spatial Motifs-Based Protein Multiple Structural Alignment Method", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 1, pp. 249-261, January/February 2012, doi:10.1109/TCBB.2011.67
REFERENCES
[1] M. Sierk and G. Kleywegt, “Deja vu All over Again: Finding and Analyzing Protein Structure Similarities,” Structure, vol. 12, no. 12, pp. 2103-2111, 2004.
[2] W. Kabsch, “A Discussion of the Solution for the Best Rotation to Relate Two Sets of Vectors,” Acta Crystallographica, vol. 34, pp. 827-828, 1978.
[3] R. Lathrop, “The Protein Threading Problem with Sequence Amino Acid Interaction Preferences Is NP-Complete,” Protein Eng., vol. 7, pp. 1059-1068, 1994.
[4] L. Holm and C. Sander, “Protein Structure Comparison by Alignment of Distance Matrices,” J. Molecular Biology, vol. 233, pp. 123-138, Sept. 1993.
[5] L. Holm and C. Sander, “3-D Lookup: Fast Protein Structure Searches at 90 Percent Reliability,” Proc. Ann. Int'l Conf. Intelligent Systems for Molecular, pp. 179-187, 1995.
[6] W. Taylor and C. Orengo, “SSAP: Sequential Structure Alignment Program for Protein Structure Comparison,” Methods Enzymology, vol. 266, pp. 617-35, 1996.
[7] I.N. Shindyalov and P.E. Bourne, “Protein Structure Alignment by Incremental Combinatorial Extension (CE) of Optimal Path,” Protein Eng., vol. 11, no. 9, pp. 739-747, 1998.
[8] J.D. Szustakowski and Z. Weng, “Protein Structure Alignment Using a Genetic Algorithm,” Proteins: Structure, Function, and Bioinformatics, vol. 38, no. 4, pp. 428-440, 2000.
[9] A.R. Ortiz, C.E. Strauss, and O. Olmea, “MAMMOTH (Matching Molecular Models Obtained from Theory): An Automated Method for Model Comparison,” Protein Science, vol. 11, no. 11, pp. 2606-2621, 2002.
[10] A.I. Jewett, C.C. Huang, and T.E. Ferrin, “Minrms: an Efficient Algorithm for Determining Protein Structure Similarity Using Root-Mean-Squared-Distance,” Bioinformatics, vol. 19, no. 5, pp. 625-634, 2003.
[11] T. Can and Y.-F. Wang, “CTSS: A Robust and Efficient Method for Protein Structure Alignment Based on Local Geometrical and Biological Features,” Proc. IEEE CS Conf. Bioinformatics, pp. 169-179, 2003.
[12] B. Kolbeck, P. May, T. Schmidt-Goenner, T. Steinke, and E.-W. Knapp, “Connectivity Independent Protein-structure Alignment: A Hierarchical Approach,” BMC Bioinformatics, vol. 7, pp. 510-530, 2006.
[13] W.R. Taylor, T.P. Flores, and C.A. Orengo, “Multiple Protein Structure Alignment,” Protein Science, vol. 3, pp. 1858-1870, 1994.
[14] D.F. Feng and R.F. Doolittle, “Progressive Sequence Alignment as a Prerequisite to Correct Phylogenetic trees,” J. Molecular Evolution, vol. 25, no. 4, pp. 351-360, 1987.
[15] M. Gerstein and M. Levitt, “Comprehensive Assessment of Automatic Structural Alignment against a Manual Standard, the Scop Classification of Proteins,” Protein Science, vol. 7, pp. 445-456, 1998.
[16] R. Russell and G. Barton, “Multiple Protein Sequence Alignment from Tertiary Structure Comparison: Assignment of Global and Residue Confidence Levels,” Proteins, vol. 14, no. 2, pp. 309-323, 1992.
[17] M. Shatsky, R. Nussinov, and H.J. Wolfson, “MultiProt—A Multiple Protein Structural Alignment Algorithm,” WABI '02: Proc. the Second Int'l Workshop Algorithms in Bioinformatics, pp. 235-250, 2002.
[18] O. Dror, H. Benyamini, R. Nussinov, and H.J. Wolfson, “Multiple Structural Alignment by Secondary Structures: Algorithm and Applications,” Protein Science, vol. 12, pp. 1492-2507, 2003.
[19] H. Sun, H. Ferhatosmanoglu, M. Ota, and Y. Wang, “Enhanced Partial Order Curve Comparison over Multiple Protein Folding Trajectories,” Computational Systems Bioinformatics Conf., pp. 229-310, 2007.
[20] X. Wang and J. Snoeyink, “Multiple Structure Alignment by Optimal Rmsd Implies that the Average Structure Is a Consensus,” Proc. Computational Systems Bioinformatics Conf, pp. 79-87, 2006.
[21] A. Lesk and C. Chothia, “How Different Amino Acid Sequences Determine Similar Protein Structures: I. the Structure and Evolutionary Dynamics of the Globins,” J. Molecular Biology, vol. 136, pp. 225-270, 1980.
[22] J. Richardson, “The Anatomy and Taxonomy of Protein Structure,” Advances in Protein Chemistry, vol. 34, pp. 167-339, 1981.
[23] T. Havel, I. Kuntz, and G. Crippen, “The Theory and Practice of Distance Geometry,” Bull. Math. Biology, vol. 45, pp. 665-720, 1983.
[24] J.C. Hart, G.K. Francis, and L.H. Kauffman, “Visualizing Quaternion Rotation,” ACM Trans. Graphics, vol. 13, no. 3, pp. 256-276, 1994.
[25] C. Lee, C. Grasso, and M. Sharlow, “Multiple Sequence Alignment Using Partial Order Graphs,” Bioinformatics, vol. 18, no. 3, pp. 452-464, 2002.
[26] C. Grasso and C. Lee, “Combining Partial Order Alignment and Progressive Multiple Sequence Alignment Increases Alignment Speed and Scalability to Very Large Alignment Problems,” Bioinformatics, vol. 20, no. 10, pp. 1546-1556, June 2004.
[27] C. Lemmen, T. Lengauer, and G. Klebe, “Flexs: A Method for Fast Flexible Ligand Superposition,” J. Medicinal Chemistry, vol. 41, pp. 4502-4520, 1998.
[28] K. Mizuguchi, C.M. Deane, T.L. Blundell, and J.P. Overington, “HOMSTRAD: A Database of Protein Structure Alignments for Homologous Families,” Protein Science, vol. 7, no. 11, pp. 2469-2471, 1998.
[29] P.O. Thompson JD and F. Plewniak, “Balibase: A Benchmark Alignment Database for the Evaluation of Multiple Alignment Programs,” Bioinformatics, vol. 15, no. 1, pp. 87-88, 1999.
[30] C. Guda, S. Lu, E.D. Scheeff, P.E. Bourne, and L.N. Shindyalov, “CE-MC: A Multiple Protein Structure Alignment Server,” Nucleic Acids Research, vol. 32, pp. W100-W103, 2004.
[31] D. Lupyan, A. Leo-Macias, and A.R.R. Ortiz, “A New Progressive-Iterative Algorithm for Multiple Structure Alignment,” Bioinformatics, vol. 21, pp. 3255-3263, June 2005.
[32] Y. Ye and A. Godzik, “Multiple Flexible Structure Alignment Using Partial Order Graphs,” Bioinformatics, vol. 21, no. 10, pp. 2362-2369, 2005.
[33] P.H. Sneath and R.R. Sokal, “Numerical Taxonomy,” Nature, vol. 193, pp. 855-860, Mar. 1962.
[34] G.J. Barton and M.J. Sternberg, “A Strategy for the Rapid Multiple Alignment of Protein Sequences. Confidence Levels from Tertiary Structure Comparisons,” J. Molecular Biology, vol. 198, no. 2, pp. 327-337, Nov. 1987.
[35] K. Kedem, L. Chew, and R. Elber, “Unit-Vector RMS(URMS) as a Tool to Analyze Molecular Dynamics Trajectories,” Proteins: Structure, Function and Genetics, vol. 37, pp. 554-564, 1999.
[36] N. Siew, A. Elofsson, L. Rychlewski, and D. Fischer, “Maxsub: An Automated Measure for the Assessment of Protein Structure Prediction Quality,” Bioinformatics, vol. 16, no. 9, pp. 776-785, Sept. 2000.
[37] Y. Ye and A. Godzik, “Flexible Structure Alignment by Chaining Aligned Fragment Pairs Allowing Twists,” Bioinformatics, vol. 19, pp. ii246-ii255, 2003.
[38] R. Nussinov and H.J. Wolfson, “Efficient Detection of Three-Dimensional Structural Motifs in Biological Macromolecules by Computer Vision Techniques,” Proc. Nat'l Academy of Sciences USA, vol. 88, no. 23, pp. 10495-10499, Dec. 1991.
[39] R.J. Siezen and J.A. Leunissen, “Subtilases: the Superfamily of Subtilisin-Like Serine Proteases,” Protein Science, vol. 6, no. 3, pp. 501-523, Mar. 1997, http://dx.doi.org/10.1002pro.5560060301.
[40] C.A. Orengo, A.D. Michie, S. Jones, D.T. Jones, M.B. Swindells, and J.M. Thornton, “CATH-A Hierarchic Classification of Protein Domain Structures,” Structure, vol. 5, no. 8, pp. 1093-1108, 1997.
[41] A. Marchler-Bauer, S. Lu, J.B. Anderson, F. Chitsaz, M.K. Derbyshire, C. DeWeese-Scott, J.H. Fong, L.Y. Geer, R.C. Geer, N.R. Gonzales, M. Gwadz, D.I. Hurwitz, J.D. Jackson, Z. Ke, C.J. Lanczycki, F. Lu, G.H. Marchler, M. Mullokandov, M.V. Omelchenko, C.L. Robertson, J.S. Song, N. Thanki, R.A. Yamashita, D. Zhang, N. Zhang, C. Zheng, and S.H. Bryant, “Cdd: A Conserved Domain Database for the Functional Annotation of Proteins,” Nucleic Acids Research, vol. 39, no. Database Issue, pp. D225-D229, Jan. 2011, http://dx.doi.org/10.1093/nargkq1189.
[42] A. Armon, D. Graur, and N. Ben-Tal, “Consurf: an Algorithmic Tool for the Identification of Functional Regions in Proteins by Surface Mapping of Phylogenetic Information,” J. Molecular Biology, vol. 307, no. 1, pp. 447-463, Mar. 2001, http://dx.doi.org/10.1006jmbi.2000.4474.
[43] A.G. Murzin, “Ob (Oligonucleotide/Oligosaccharide Binding)-Fold: Common Structural and Functional Solution for Non-Homologous Sequences,” The European Molecular Biology Organization J., vol. 12, no. 3, pp. 861-867, Mar. 1993.
[44] A. Murzin, S.E. Brenner, T. Hubbard, and C. Chothia, “SCOP: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures,” J. Molecular Biology, vol. 247, pp. 536-540, 1995.
[45] M. Menke, B. Berger, and L. Cowen, “Matt: Local Flexibility Aids Protein Multiple Structure Alignment,” PLOS Computational Biology, vol. 4, no. 1, p. e10, 2008.
[46] A. Andreeva, A. PrliÄ, T.J.P. Hubbard, and A.G. Murzin, “Sisyphus-Structural Alignments for Proteins with Non-Trivial Relationships,” Nucleic Acids Research, vol. 35, no. Database Issue, pp. D253-D259, Jan. 2007, http://dx.doi.org/10.1093/nargkl746.
[47] I.V. Walle, I. Lasters, and L. Wyns, “Sabmark-a Benchmark for Sequence Alignment that Covers the Entire Known Fold Space,” Bioinformatics, vol. 21, no. 7, pp. 1267-1268, Apr. 2005, http://dx.doi.org/10.1093/bioinformatics bth493.
[48] C. Micheletti and H. Orland, “Mistral: A Tool for Energy-Based Multiple Structural Alignment of Proteins,” Bioinformatics, vol. 25, no. 20, pp. 2663-2669, Oct. 2009, http://dx.doi.org/10.1093/bioinformatics btp506.
[49] I. Ilinkin, J. Ye, and R. Janardan, “Multiple Structure Alignment and Consensus Identification for Proteins,” BMC Bioinformatics, vol. 11, article 71, 2010, http://dx.doi.org/10.11861471-2105-11-71 .
[50] A.S. Konagurthu, J.C. Whisstock, P.J. Stuckey, and A.M. Lesk, “Mustang: A Multiple Structural Alignment Algorithm,” Proteins: Structure, Function, and Bioinformatics, vol. 64, no. 3, pp. 559-574, 2006, http://dx.doi.org/10.1002prot.20921.
[51] J. Ye and R. Janardan, “Approximate Multiple Protein Structure Alignment Using the Sum-of-Pairs Distance,” J. Computational Biology, vol. 11, no. 5, pp. 986-1000, 2004.
[52] I. Eidhammer, I. Jonassen, and W.R. Taylor, “Structure Comparison and Structure Patterns,” J. Computational Biology, vol. 7, no. 5, pp. 685-716, 2000, http://dx.doi.org/10.1089106652701446152 .
[53] N.N. Alexandrov, K. Takahashi, and N. Go, “Common Spatial Arrangements of Backbone Fragments in Homologous and Non-Homologous Proteins,” J. Molecular Biology, vol. 225, no. 1, pp. 5-9, May 1992.
[54] L.P. Chew, D. Huttenlocher, K. Kedem, and J. Kleinberg, “Fast Detection of Common Geometric Substructure in Proteins,” J. Computational Biology, vol. 6, nos. 3/4, pp. 313-325, 1999, http://dx.doi.org/10.1089106652799318292 .
[55] A. Godzik, J. Skolnick, and A. Kolinski, “Regularities in Interaction Patterns of Globular Proteins,” Protein Eng., vol. 6, no. 8, pp. 801-810, Nov. 1993.
[56] D. Strickland, E. Barnes, and J. Sokol, “Optimal Protein Structure Alignment Using Maximum Cliques,” Operations Research, vol. 53, pp. 389-402, 2005.
[57] D. Goldman, S. Istrail, and C. Papadimitriou, “Algorithmic Aspects of Protein Structure Similarity,” Proc. 40th Ann. IEEE Symp. Foundations Computational Science, pp. 512-522, 1999.
[58] W. Pullan, “Protein Structure Alignment Using Maximum Cliques and Local Search,” Proc. 20th Australian Joint Conf. Advances in Artificial Intelligence, pp. 776-780, 2007.
[59] A. Sacan, O. Ozturk, H. Ferhatosmanoglu, and Y. Wang, “Lfm-Pro: A Tool for Detecting Significant Local Structural Sites in Proteins,” Bioinformatics, vol. 23, no. 6, pp. 709-716, 2007.
[60] X. Yan and J. Han, “gSpan: Graph-Based Substructure Pattern Mining,” Proc. IEEE Int'l Conf. Data Mining (ICDM '02), pp. 721-724, Dec. 2002.
[61] D. Bandyopadhyay, J. Huan, J. Prins, J. Snoeyink, W. Wang, and A. Tropsha, “Identification of Family-specific Residue Packing Motifs and Their Use for Structure-based Protein Function Prediction: I. Method Development,” J. Computer-Aided Molecular Design, vol. 23, no. 11, pp. 773-784, Nov. 2009, http://dx.doi.org/10.1007s10822-009-9273-4 .
32 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool