CSDL Home IEEE/ACM Transactions on Computational Biology and Bioinformatics 2013 vol.10 Issue No.01 - Jan.-Feb.

Subscribe

Issue No.01 - Jan.-Feb. (2013 vol.10)

pp: 26-36

Inken Wohlers , Genominformatik, Univ. Duisburg-Essen/ Universitatsklinikum, Essen, Germany

Rumen Andonov , INRIA Rennes - Bretagne Atlantique, Rennes, France

Gunnar W. Klau , Life Sci. Group, Centrum Wiskunde & Inf., Amsterdam, Netherlands

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2012.143

ABSTRACT

We present a mathematical model and exact algorithm for optimally aligning protein structures using the DALI scoring model. This scoring model is based on comparing the interresidue distance matrices of proteins and is used in the popular DALI software tool, a heuristic method for protein structure alignment. Our model and algorithm extend an integer linear programming approach that has been previously applied for the related, but simpler, contact map overlap problem. To this end, we introduce a novel type of constraint that handles negative score values and relax it in a Lagrangian fashion. The new algorithm, which we call DALIX, is applicable to any distance matrix-based scoring scheme. We also review options that allow to consider fewer pairs of interresidue distances explicitly because their large number hinders the optimization process. Using four known data sets of varying structural similarity, we compute many provably score-optimal DALI alignments. This allowed, for the first time, to evaluate the DALI heuristic in sound mathematical terms. The results indicate that DALI usually computes optimal or close to optimal alignments. However, we detect a subset of small proteins for which DALI fails to generate any significant alignment, although such alignments do exist.

INDEX TERMS

Proteins, Mathematical model, Dynamic programming, Linear programming, Computational biology, Bioinformatics, Amino acids,DALI, Structure alignment, interresidue distance matrix, exact algorithm, integer linear program, Lagrangian relaxation

CITATION

Inken Wohlers, Rumen Andonov, Gunnar W. Klau, "DALIX: Optimal DALI Protein Structure Alignment",

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, vol.10, no. 1, pp. 26-36, Jan.-Feb. 2013, doi:10.1109/TCBB.2012.143REFERENCES

- [1] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, and P.E. Bourne, “The Protein Data Bank,”
Nucleic Acids Research, vol. 28, no. 1, pp. 235-242, 2000.- [2] H. Hasegawa and L. Holm, “Advances and Pitfalls of Protein Structural Alignment,”
Current Opinion Structural Biology, vol. 19, no. 3, pp. 341-348, 2009.- [3] C. Ambühl, S. Chakraborty, and B. Gärtner, “Computing Largest Common Point Sets under Approximate Congruence,”
Proc. Ann. European Symp. Algorithms (ESA), pp. 52-63, 2000.- [4] A. Poleksic, “Optimizing a Widely Used Protein Structure Alignment Measure in Expected Polynomial Time,”
IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 6, pp. 1716-1720, Nov./Dec. 2011.- [5] A. Poleksic, “Algorithms for Optimal Protein Structure Alignment,”
Bioinformatics, vol. 25, no. 21, pp. 2751-2756, 2009.- [6] R. Kolodny and N. Linial, “Approximate Protein Structural Alignment in Polynomial Time,”
Proc. Nat'l Academy Sciences USA, vol. 101, no. 33, pp. 12201-12206, 2004.- [7] T. Akutsu, “Protein Structure Alignment Using Dynamic Programming and Iterative Improvement,”
IEICE Trans. Information and Systems, vol. E79-D, no. 12, pp. 1629-1636, 1995.- [8] R.H. Lathrop, “The Protein Threading Problem with Sequence Amino Acid Interaction Preferences Is NP-Complete,”
Protein Eng., vol. 7, no. 9, pp. 1059-1068, 1994.- [9] A. Godzik, J. Skolnick, and A. Kolinski, “Regularities in Interaction Patterns of Globular Proteins,”
Protein Eng., vol. 6, no. 8, pp. 801-810, 1993.- [10] D. Goldman, C.H. Papadimitriou, and S. Istrail, “Algorithmic Aspects of Protein Structure Similarity,”
Proc. IEEE Ann. Symp. Foundations Computer Science, pp. 512-521, 1999.- [11] S.C. Li and Y.K. Ng, “On Protein Structure Alignment under Distance Constraint,”
Proc. Int'l Symp. Algorithms and Computation (ISAAC), pp. 65-76, 2009.- [12] J. Xu, F. Jiao, and B. Berger, “A Parameterized Algorithm for Protein Structure Alignment,”
J. Computational Biology, vol. 14, no. 5, pp. 564-577, 2007.- [13] A. Caprara, R. Carr, S. Istrail, G. Lancia, and B. Walenz, “1001 Optimal PDB Structure Alignments: Integer Programming Methods for Finding the Maximum Contact Map Overlap,”
J. Computational Biology, vol. 11, no. 1, pp. 27-52, 2004.- [14] W. Xie and N.V. Sahinidis, “A Reduction-Based Exact Algorithm for the Contact Map Overlap Problem,”
J. Computational Biology, vol. 14, no. 5, pp. 637-654, 2007.- [15] R. Andonov, N. Malod-Dognin, and N. Yanev, “Maximum Contact Map Overlap Revisited,”
J. Computational Biology, vol. 18, no. 1, pp. 27-41, 2011.- [16] L. Holm and C. Sander, “Protein Structure Comparison by Alignment of Distance Matrices,”
J. Molecular Biology, vol. 233, no. 1, pp. 123-138, 1993.- [17] W.R. Taylor and C.A. Orengo, “Protein Structure Alignment,”
J. Molecular Biology, vol. 208, no. 1, pp. 1-22, http://www. hubmed.orgdisplay.cgi?uids=2769748 , July 1989.- [18] T. Kawabata and K. Nishikawa, “Protein Structure Comparison Using the Markov Transition Model of Evolution,”
Proteins, vol. 41, no. 1, pp. 108-122, 2000.- [19] L. Mavridis, V. Venkatraman, D.W. Ritchie, N. Morikawa, R. Andonov, A. Cornu, N. Malod-Dognin, J. Nicolas, M. Temerinac-Ott, M. Reisert, H. Burkhardt, A. Axenopoulos, and P. Daras, “SHREC'10 Track: Protein Model Classification,”
Proc. Third Eurographics Conf. 3D Object Retrieval (3DOR), pp. 117-124, 2010.- [20] N. Malod-Dognin, M. Le Boudic-Jamin, P. Kamath, and R. Andonov, “Using Dominances for Solving the Protein Family Identification Problem,”
Proc. 11th Int'l Conf. Algorithms in Bioinformatics (WABI), pp. 201-212, 2011.- [21] I. Wohlers, F.S. Domingues, and G.W. Klau, “Towards Optimal Alignment of Protein Structure Distance Matrices,”
Bioinformatics, vol. 26, no. 18, pp. 2273-2280, 2010.- [22] I. Wohlers, R. Andonov, and G.W. Klau, “Algorithm Engineering for Optimal Alignment of Protein Structure Distance Matrices,”
Optimization Letters, vol. 5, no. 3, pp. 421-433, 2011.- [23] G. Mayr, F.S. Domingues, and P. Lackner, “Comparative Analysis of Protein Structure Alignments,”
BMC Structural Biology, vol. 7, pp. 50-50, 2007.- [24] G. Csaba, F. Birzele, and R. Zimmer, “Systematic Comparison of SCOP and CATH: A New Gold Standard for Protein Structure Analysis,”
BMC Structural Biology, vol. 9, pp. 23-23, 2009.- [25] C. Berbalk, C.S. Schwaiger, and P. Lackner, “Accuracy Analysis of Multiple Structure Alignments,”
Protein Science, vol. 18, pp. 2027-2035, 2009.- [26] I. Wohlers, N. Malod-Dognin, R. Andonov, and G.W. Klau, “CSA: Comprehensive Comparison of Pairwise Protein Structure Alignments,”
Nucleic Acids Research, vol. 40, pp. 303-309, 2012.- [27] L. Holm and C. Sander, “Dictionary of Recurrent Domains in Protein Structures,”
Proteins, vol. 33, no. 1, pp. 88-96, 1998.- [28] M. Held, P. Wolfe, and H.P. Crowder, “Validation of Subgradient Optimization,”
Math. Programming, vol. 6, no. 1, pp. 62-88, 1974.- [29] N. Malod-Dognin, R. Andonov, and N. Yanev, “Maximum Cliques in Protein Structure Comparison,”
Proc. Int'l Conf. Experimental Algorithms, pp. 106-117, 2010.- [30] A.G. Murzin, S.E. Brenner, T. Hubbard, and C. Chothia, “SCOP: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures,”
J. Molecular Biology, vol. 247, no. 4, pp. 536-540, 1995.- [31] L.H. Greene, T.E. Lewis, S. Addou, A. Cuff, T. Dallman, M. Dibley, O. Redfern, F. Pearl, R. Nambudiry, A. Reid, I. Sillitoe, C. Yeats, J.M. Thornton, and C.A. Orengo, “The CATH Domain Structure Database: New Protocols and Classification Levels Give a More Comprehensive Resource for Exploring Evolution,”
Nucleic Acids Research, vol. 35, pp. 291-297, 2007.- [32] A. Andreeva, A. Prlić, T.J. Hubbard, and A.G. Murzin, “SISYPHUS-Structural Alignments for Proteins with Non-Trivial Relationships,”
Nucleic Acids Research, vol. 35, pp. 253-259, 2007. |