CSDL Home IEEE/ACM Transactions on Computational Biology and Bioinformatics 2013 vol.10 Issue No.06 - Nov.-Dec.

Subscribe

Issue No.06 - Nov.-Dec. (2013 vol.10)

pp: 1372-1383

Tim Wylie , Dept. of Comput. Sci., Montana State Univ., Bozeman, MT, USA

Binhai Zhu , Dept. of Comput. Sci., Montana State Univ., Bozeman, MT, USA

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2013.17

ABSTRACT

For protein structure alignment and comparison, a lot of work has been done using RMSD as the distance measure, which has drawbacks under certain circumstances. Thus, the discrete Frόchet distance was recently applied to the problem of protein (backbone) structure alignment and comparison with promising results. For this problem, visualization is also important because protein chain backbones can have as many as 500-600 α-carbon atoms, which constitute the vertices in the comparison. Even with an excellent alignment, the similarity of two polygonal chains can be difficult to visualize unless the chains are nearly identical. Thus, the chain pair simplification problem (CPS-3F) was proposed in 2008 to simultaneously simplify both chains with respect to each other under the discrete Frochet distance. The complexity of CPS-3F is unknown, so heuristic methods have been developed. Here, we define a variation of CPS-3F, called the constrained CPS-3F problem (CPS-3Fþ), and prove that it is polynomially solvable by presenting a dynamic programming solution, which we then prove is a factor-2 approximation for CPS-3F. We then compare the CPS-3Fþ solutions with previous empirical results, and further demonstrate some of the benefits of the simplified comparisons. Chain pair simplification based on the Hausdorff distance (CPS-2H) is known to be NP-complete, and here we prove that the constrained version (CPS-2H

^{+}) is also NP-complete. Finally, we discuss future work and implications along with a software library implementation, named the Frochet-based Protein Alignment & Comparison Toolkit (FPACT).INDEX TERMS

Proteins, Approximation methods, Bioinformatics, Visualization, Dynamic programming, Approximation algorithms,NP-complete, Protein structure alignment, protein structure simplification and visualization, Discrete Fréchet distance, Approximation algorithms, dynamic programming

CITATION

Tim Wylie, Binhai Zhu, "Protein Chain Pair Simplification under the Discrete Fréchet Distance",

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, vol.10, no. 6, pp. 1372-1383, Nov.-Dec. 2013, doi:10.1109/TCBB.2013.17REFERENCES

- [1] H. Alt, B. Behrends, and J. Blömer, "Approximate Matching of Polygonal Shapes (Extended Abstract),"
Proc. Seventh Ann. Symp. Computational Geometry (SoCG '91), pp. 186-193, 1991.- [2] H. Alt and M. Godau, "Measuring the Resemblance of Polygonal Curves,"
Proc. Eighth Ann. Symp. Computational Geometry (SoCG '92), pp. 102-109, 1992.- [3] H. Alt and M. Godau, "Computing the Fréchet Distance between Two Polygonal Curves,"
Int'l J. Computational Geometry and Applications, vol. 5, pp. 75-91, 1995.- [4] H. Alt, C. Knauer, and C. Wenk, "Matching Polygonal Curves with Respect to the Fréchet Distance,"
Proc. 18th Ann. Symp. Theoretical Aspects of Computer Science (STACS '01), pp. 63-74, 2001.- [5] B. Aronov, S. Har-Peled, C. Knauer, Y. Wang, and C. Wenk, "Fréchet Distance for Curves, Revisited,"
Proc. 14th Conf. Ann. European Symp. (ESA '06), vol. 14, pp. 52-63, 2006.- [6] S. Bereg, M. Jiang, W. Wang, B. Yang, and B. Zhu, "Simplifying 3D Polygonal Chains under the Discrete Fréchet Distance,"
Proc. Eighth Latin Am. Theoretical Informatics Symp. (LATIN '08), pp. 630-641, 2008.- [7] Z. Chen, B. Fu, and B. Zhu, "The Approximability of the Exemplar Breakpoint Distance Problem,"
Proc. Second Int'l Conf. Algorithmic Aspects in Information and Management (AAIM '06), pp. 291-302, 2006.- [8] R. Cole, "Slowing Down Sorting Networks to Obtain Faster Sorting Algorithms,"
J. ACM, vol. 34, pp. 200-208, 1987.- [9] L. Conte, B. Ailey, T. Hubbard, S. Brenner, A. Murzin, and C. Chothia, "SCOP: A Structural Classification of Protein Database,"
Nucleic Acids Research, vol. 28, pp. 257-259, 2000.- [10] T. Eiter and H. Mannila, "Computing Discrete Fréchet Distance," Technical Report CD-TR 94/64, Information Systems Dept., Technical Univ. of Vienna, 1994.
- [11] M. Fréchet, "Sur Quelques Points Du Calcul Fonctionnel,"
Rendiconti del Circolo Mathematico di Palermo, vol. 22, pp. 1-74, 1906.- [12] F. Hausdorff,
Grundzge Der Mengenlehre. Von Veit, 1914.- [13] L. Holm and J. Park, "DaliLite Workbench for Protein Structure Comparison,"
Bioinformatics, vol. 16, pp. 566-567, 2000.- [14] L. Holm and C. Sander, "Protein Structure Comparison by Alignment of Distance Matrices,"
J. Molecular Biology, vol. 233, pp. 123-138, 1993.- [15] F. Itakura, "Minimum Prediction Residual Principle Applied to Speech Recognition,"
IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 23, no. 1, pp. 67-72, Feb. 1975.- [16] M. Jiang, Y. Xu, and B. Zhu, "Protein Structure-Structure Alignment with Discrete Fréchet Distance,"
J. Bioinformatics and Computational Biology, vol. 6, pp. 51-64, 2008.- [17] C. Mauzy and M. Hermodson, "Structural Homology between rbs Repressor and Ribose Binding Protein Implies Functional Similarity,"
Protein Science, vol. 1, pp. 843-849, 1992.- [18] J.R. Munkres,
Topology. Prentice Hall, 2000.- [19] S. Needleman and C. Wunsch, "A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins,"
J. Molecular Biology, vol. 48, pp. 443-453, 1970.- [20] C. Orengo, A. Michie, S. Jones, D. Jones, M. Swindles, and J. Thornton, "CATH—A Hierarchic Classification of Protein Domain Structures,"
Structure, vol. 5, pp. 1093-1108, 1997.- [21] A. Oritz, C. Strauss, and O. Olmea, "MAMMOTH (Matching Molecular Models Obtained from Theory): An Automated Method for Model Comparison,"
Protein Science, vol. 11, pp. 2606-2621, 2002.- [22] H. Sakoe and S. Chiba, "Dynamic Programming Algorithm Optimization for Spoken Word Recognition,"
IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 26, no. 1, pp. 43-49, Feb. 1978.- [23] I. Shindyalov and P. Bourne, "Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path,"
Protein Eng., vol. 11, pp. 739-747, 1998.- [24] C.-R. Shyu, P.-H. Chi, G. Scott, and D. Xu, "ProteinDBS: A Real-Time Retrieval System for Protein Structure Comparison,"
Nucleic Acids Research, vol. 32, pp. W572-W575, 2004.- [25] W. Taylor and C. Orengo, "Protein Structure Alignment,"
J. Molecular Biology, vol. 208, pp. 1-22, 1989.- [26] S. Vasilache, N. Mirshahi, S. Ji, J. Mottonen, D.J. Jacobs, and K. Najarian, "A Signal Processing Method to Explore Similarity in Protein Flexibility,"
Advances in Bioinformatics, vol. 2010, article 8, 2010.- [27] T.K. Vintsyuk, "Speech Discrimination by Dynamic Programming,"
Cybernetics, vol. 4, no. 1, pp. 52-57, 1968.- [28] C. Wenk, "Shape Matching in Higher Dimensions," PhD thesis, Freie Universitaet Berlin, 2002.
- [29] T. Wylie, J. Luo, and B. Zhu, "A Practical Solution for Aligning and Simplifying Pairs of Protein Backbones under the Discrete Fréchet Distance,"
Proc. 11th Int'l Conf. Computational Science and Its Applications (ICCSA '11), pp. 74-83, 2011.- [30] T. Wylie and B. Zhu, "A Polynomial Time Solution for Protein Chain Pair Simplification under the Discrete Fréchet Distance,"
Proc. Int'l Symp. Bioinformatics Research and Applications (ISBRA '12), pp. 287-298, 2012.- [31] T. Wylie, "FPACT: The Fréchet-Based Protein Alignment & Comparison Toolkit," http://www.cs.montana.edu/~timothy. wylie frechet, 2012.
- [32] J.-M. Yang and C.-H. Tung, "Protein Structure Database Search and Evolutionary Classification,"
Nucleic Acids Research, vol. 34, pp. 3646-3659, 2006.- [33] B. Zhu, "Protein Local Structure Alignment under the Discrete Fréchet Distance,"
J. Computational Biology, vol. 14, no. 10, pp. 1343-1351, 2007. |