This Article 
 Bibliographic References 
 Add to: 
SEGA: Semiglobal Graph Alignment for Structure-Based Protein Comparison
September/October 2011 (vol. 8 no. 5)
pp. 1330-1343
Marco Mernberger, Philipps University, Marburg
Gerhard Klebe, Philipps University, Marburg
Eyke Hüllermeier, Philipps University, Marburg
Comparative analysis is a topic of utmost importance in structural bioinformatics. Recently, a structural counterpart to sequence alignment, called multiple graph alignment, was introduced as a tool for the comparison of protein structures in general and protein binding sites in particular. Using approximate graph matching techniques, this method enables the identification of approximately conserved patterns in functionally related structures. In this paper, we introduce a new method for computing graph alignments motivated by two problems of the original approach, a conceptual and a computational one. First, the existing approach is of limited usefulness for structures that only share common substructures. Second, the goal to find a globally optimal alignment leads to an optimization problem that is computationally intractable. To overcome these disadvantages, we propose a semiglobal approach to graph alignment in analogy to semiglobal sequence alignment that combines the advantages of local and global graph matching.

[1] L. Jensen, R. Gupta, H. Staerfeldt, and S. Brunak, “Prediction of Human Protein Function According to Gene Ontology Categories,” Bioinformatics, vol. 19, no. 5, pp. 635-642, 2003.
[2] K. Sjölander, “Phylogenomic Inference of Protein Molecular Function: Advances and Challenges,” Bioinformatics, vol. 20, no. 2, pp. 170-179, 2004.
[3] P. Artymiuk, A. Poirrette, H. Grindley, D. Rice, and P. Willett, “A Graph-Theoretic Approach to the Identification of Three-dimensional Patterns of Amino Acid Side-chains in Protein Structures,” J. Molecular Biology, vol. 243, no. 2, pp. 327-344, 1994.
[4] L. Holm and J. Park, “DaliLite Workbench for Protein Structure Comparison,” Bioinformatics, vol. 16, no. 6, pp. 566-567, 2000.
[5] I. Shindyalov and P. Bourne, “A Database and Tools for 3-D Protein Structure Comparison and Alignment Using the Combinatorial Extension (CE) Algorithm,” Nucleic Acids Research, vol. 29, no. 1, pp. 228-229, 2001.
[6] A. Ortiz, C. Strauss, and O. Olmea, “MAMMOTH (Matching Molecular Models Obtained from Theory): An Automated Method for Model Comparison,” Protein Science, vol. 11, no. 11, pp. 2606-2621, 2002.
[7] M. Jambon, A. Imberty, G. Deleage, and C. Geourjon, “A New Bioinformatic Approach to Detect Common 3 D Sites in Protein Structures,” Proteins Structure Function and Genetics, vol. 52, no. 2, pp. 137-145, 2003.
[8] O. Dror, H. Benyamini, R. Nussinov, and H. Wolfson, “MASS: Multiple Structural Alignment by Secondary Structures,” Bioinformatics, vol. 19, no. 1, pp. i95-104, 2003.
[9] M. Shatsky, A. Shulman-Peleg, R. Nussinov, and H. Wolfson, “The Multiple Common Point Set Problem and Its Application to Molecule Binding Pattern Detection,” J. Computational Biology, vol. 13, no. 2, pp. 407-428, 2006.
[10] B. Rost, “Enzyme Function Less Conserved Than Anticipated,” J. Molecular Biology, vol. 318, no. 2, pp. 595-608, 2002.
[11] W. Tian and J. Skolnick, “How Well Is Enzyme Function Conserved As a Function of Pairwise Sequence Identity?,” J. Molecular Biology, vol. 333, no. 4, pp. 863-882, 2003.
[12] J. Thornton, C. Orengo, A. Todd, and F. Pearl, “Protein Folds, Functions and Evolution,” J. Molecular Biology, vol. 293, no. 2, pp. 333-342, 1999.
[13] S. Copley, W. Novak, and P. Babbitt, “Divergence of Function in the Thioredoxin Fold Suprafamily: Evidence for Evolution of Peroxiredoxins from a Thioredoxin-like Ancestor,” Biochemistry, vol. 43, no. 44, pp. 13981-13995, 2004.
[14] K. Wang and R. Samudrala, “FSSA: A Novel Method for Identifying Functional Signatures from Structural Alignments,” Bioinformatics, vol. 21, no. 13, pp. 2969-2977, 2005.
[15] B. Polacco and P. Babbitt, “Automated Discovery of 3D Motifs for Protein Function Annotation,” Bioinformatics, vol. 22, no. 6, pp. 723-730, 2006.
[16] S. Schmitt, D. Kuhn, and G. Klebe, “A New Method to Detect Related Function among Proteins Independent of Sequence and Fold Homology,” J. Molecular Biology, vol. 323, no. 2, pp. 387-406, 2002.
[17] N. Weskamp, E. Hüllermeier, D. Kuhn, and G. Klebe, “Multiple Graph Alignment for the Structural Analysis of Protein Active Sites,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 4, no. 2, pp. 310-320, Apr.-June 2007.
[18] A. Murzin, S. Brenner, T. Hubbard, and C. Chothia, “SCOP: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures,” J. Molecular Biology, vol. 247, no. 4, pp. 536-540, 1995.
[19] C. Orengo, A. Michie, S. Jones, D. Jones, M. Swindells, and J. Thornton, “CATH-A Hierarchic Classification of Protein Domain Structures,” Structure, vol. 5, no. 8, pp. 1093-1108, 1997.
[20] O. Redfern, A. Harrison, T. Dallman, F. Pearl, and C. Orengo, “CATHEDRAL: A Fast and Effective Algorithm to Predict Folds and Domain Boundaries from Multidomain Protein Structures,” PLoS Computational Biology, vol. 3, no. 11, pp. 2334-2347, 2007.
[21] A. Wallace, N. Borkakoti, and J. Thornton, “TESS: A Geometric Hashing Algorithm for Deriving 3D Coordinate Templates for Searching Structural Databases. Application to Enzyme Active Sites,” Protein Science, vol. 6, no. 11, pp. 2308-2323, 1997.
[22] N. Leibowitz, R. Nussinov, and H. Wolfson, “MUSTA-A General, Efficient, Automated Method for Multiple Structure Alignment and Detection of Common Motifs: Application to Proteins,” J. Computational Biology, vol. 8, no. 2, pp. 93-121, 2001.
[23] J. Barker and J. Thornton, “An Algorithm for Constraint-Based Structural Template Matching: Application to 3D Templates with Statistical Analysis,” Bioinformatics, vol. 19, no. 13, pp. 1644-1649, 2003.
[24] P. Wangikar, A. Tendulkar, S. Ramya, D. Mali, and S. Sarawagi, “Functional Sites in Protein Families Uncovered via an Objective and Automated Graph Theoretic Approach,” J. Molecular Biology, vol. 326, no. 3, pp. 955-978, 2003.
[25] G. Kleywegt, “Recognition of Spatial Motifs in Protein Structures,” J. Molecular Biology, vol. 285, no. 4, pp. 1887-1897, 1999.
[26] A. Stark and R. Russell, “Annotation in Three Dimensions. PINTS: Patterns in Non-Homologous Tertiary Structures,” Nucleic Acids Research, vol. 31, no. 13, pp. 3341-3344, 2003.
[27] F. Glaser, R. Morris, R. Najmanovich, R. Laskowski, and J. Thornton, “A Method for Localizing Ligand Binding Pockets in Protein Structures,” Proteins, vol. 62, pp. 479-488, 2006.
[28] S. Bagley and R. Altman, “Characterizing the Microenvironment Surrounding Protein Sites,” Protein Science, vol. 4, no. 4, pp. 622-635, 1995.
[29] T. Binkowski, L. Adamian, and J. Liang, “Inferring Functional Relationships of Proteins from Local Sequence and Spatial Surface Patterns,” J. Molecular Biology, vol. 332, no. 2, pp. 505-526, 2003.
[30] Y. Tseng, J. Dundas, and J. Liang, “Predicting Protein Function and Binding Profile via Matching of Local Evolutionary and Geometric Surface Patterns,” J. Molecular Biology, vol. 387, no. 2, pp. 451-464, 2009.
[31] L. Xie and P. Bourne, “Detecting Evolutionary Relationships Across Existing Fold Space, Using Sequence Order-Independent Profile-Profile Alignments,” Proc. Nat'l Academy of Sciences USA, vol. 105, no. 14, pp. 5441-5446, 2008.
[32] A. Shulman-Peleg, R. Nussinov, and H. Wolfson, “Recognition of Functional Sites in Protein Structures,” J. Molecular Biology, vol. 339, no. 3, pp. 607-633, 2004.
[33] R. Spriggs, P. Artymiuk, and P. Willett, “Searching for Patterns of Amino Acids in 3D Protein Structures,” J. Chemical Information and Computer Sciences, vol. 43, no. 2, pp. 412-421, 2003.
[34] K. Kinoshita and H. Nakamura, “Identification of the Ligand Binding Sites on the Molecular Surface of Proteins,” Protein Science, vol. 14, no. 3, pp. 711-718, 2005.
[35] T. Fober, M. Mernberger, G. Klebe, and E. Hüllermeier, “Evolutionary Construction of Multiple Graph Alignments for the Structural Analysis of Biomolecules,” Bioinformatics, vol. Advanced Access, p. btp144, http://bioinformatics.oxfordjournals. org/ cgi/content/abstractbtp144v1, 2009.
[36] J. Berg and M. Lässig, “Local Graph Alignment and Motif Search in Biological Networks,” Proc. Nat'l Academy of Sciences USA, vol. 101, no. 41, pp. 14689-14694, 2004.
[37] I. Xenarios, L. Salwinski, X. Duan, P. Higney, S. Kim, and D. Eisenberg, “DIP, the Database of Interacting Proteins: A Research Tool for Studying Cellular Networks of Protein Interactions,” Nucleic Acids Research, vol. 30, no. 1, pp. 303-305, 2002.
[38] H. Bunke and X. Jiang, Graph Matching and Similarity, D. M. Horia-Nicolai Teodorescu, ed. Kluwer Academic Publishers, 2000.
[39] C. Bron and J. Kerbosch, “Algorithm 457: Finding All Cliques of an Undirected Graph,” Comm. ACM, vol. 16, no. 9, pp. 575-577, 1973.
[40] M. Pelillo, “A Unifying Framework for Relational Structure Matching,” Proc. 14th Int'l Conf. Pattern Recognition, vol. 2, 1998.
[41] J. McGregor, “Backtrack Search Algorithms and the Maximal Common Subgraph Problem,” Software: Practice and Experience, vol. 12, no. 1, pp. 23-34, 1982.
[42] D. Schmidt and L. Druffel, “A Fast Backtracking Algorithm to Test Directed Graphs for Isomorphism Using Distance Matrices,” J. ACM, vol. 23, no. 3, pp. 433-445, 1976.
[43] M. Wagener and J. Gasteiger, “The Determination of Maximum Common Substructures by a Genetic Algorithm: Application in Synthesis Design and for the Structural Analysis of Biological Activity,” Angewandte Chemie Int'l Edition in English, vol. 33, no. 11, pp. 1189-1192, 1994.
[44] J. Raymond, E. Gardiner, and P. Willett, “Heuristics for Similarity Searching of Chemical Graphs Using a Maximum Common Edge Subgraph Algorithm,” J. Chemical Information and Computer Sciences, vol. 42, no. 2, pp. 305-316, 2002.
[45] J. Ullmann, “An Algorithm for Subgraph Isomorphism,” J. ACM, vol. 23, no. 1, pp. 31-42, 1976.
[46] T. Gärtner, “A Survey of Kernels for Structured Data,” SIGKKD Explorations, vol. 5, no. 1, pp. 49-58, 2003.
[47] K. Borgwardt, C. Ong, S. Schonauer, S. Vishwanathan, A. Smola, and H. Kriegel, “Protein Function Prediction via Graph Kernels,” Bioinformatics, vol. 21, no. 1, pp. i47-i56, 2005.
[48] A. Sanfeliu and K. Fu, “A Distance Measure between Attributed Relational Graphs for Pattern Recognition,” IEEE Trans. Systems, Man, and Cybernetics, vol. SMC-13, no. 3, pp. 353-362, May/June 1983.
[49] R. Najmanovich, N. Kurbatova, and J. Thornton, “Detection of 3D Atomic Similarities and Their Use in the Discrimination of Small Molecule Protein-Binding Sites,” Bioinformatics, vol. 24, no. 16, pp. i105-i111, 2008.
[50] M. Hendlich, F. Rippmann, and G. Barnickel, “LIGSITE: Automatic and Efficient Detection of Potential Small Molecule-Binding Sites in Proteins,” J. Molecular Graphics and Modelling, vol. 15, no. 6, pp. 359-363, 1997.
[51] M. Shatsky, R. Nussinov, and H. Wolfson, “Flexible Protein Alignment and Hinge Detection,” Proteins: Structure, Function and Bioinformatics, vol. 48, no. 2, pp. 242-256, 2002.
[52] G. Verbitsky, R. Nussinov, and H. Wolfson, “Flexible Structural Comparison Allowing Hinge-Bending, Swiveling Motions,” Proteins Structure Function and Genetics, vol. 34, no. 2, pp. 232-254, 1999.
[53] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge Univ. Press, 2004.
[54] H. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. Bhat, H. Weissig, I. Shindyalov, and P. Bourne, “The Protein Data Bank,” Nucleic Acids Research, vol. 28, no. 1, pp. 235-242, 2000.
[55] H. Kuhn, “The Hungarian Method for the Assignment Problem,” Naval Research Logistics, vol. 52, no. 1, pp. 7-21, 2005.
[56] J. Fodor and M. Roubens, Fuzzy Preference Modelling and Multicriteria Decision Support. Kluwer Academic Publishers, 1994.
[57] S. Schmitt, M. Hendlich, and G. Klebe, “From Structure to Function: A New Approach to Detect Functional Similarity among Proteins Independent from Sequence and Fold Homology,” Angewandte Chemie Int'l ed., vol. 40, no. 17, pp. 3141-3146, 2001.
[58] T. Binkowski, S. Naghibzadeh, and J. Liang, “CASTp: Computed Atlas of Surface Topography of Proteins,” Nucleic Acids Research, vol. 31, no. 13, pp. 3352-3355, 2003.
[59] C. Orengo, A. Todd, and J. Thornton, “From Protein Structure to Function,” Current Opinion in Structural Biology, vol. 9, no. 3, pp. 374-382, 1999.

Index Terms:
Approximate graph matching, protein binding sites, structure comparison, graph alignment, structural bioinformatics.
Marco Mernberger, Gerhard Klebe, Eyke Hüllermeier, "SEGA: Semiglobal Graph Alignment for Structure-Based Protein Comparison," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 5, pp. 1330-1343, Sept.-Oct. 2011, doi:10.1109/TCBB.2011.35
Usage of this product signifies your acceptance of the Terms of Use.