The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - Sept.-Oct. (2012 vol.9)
pp: 1451-1458
Nicolas Bonnel , IRISA, Universitede Bretagne Sud, Vannes, France
Pierre-Francois Marteau , IRISA, Universitede Bretagne Sud, Vannes, France
ABSTRACT
In the last two decades, a lot of protein 3D shapes have been discovered, characterized, and made available thanks to the Protein Data Bank (PDB), that is nevertheless growing very quickly. New scalable methods are thus urgently required to search through the PDB efficiently. This paper presents an approach entitled LNA (Laplacian Norm Alignment) that performs a structural comparison of two proteins with dynamic programming algorithms. This is achieved by characterizing each residue in the protein with scalar features. The feature values are calculated using a Laplacian operator applied on the graph corresponding to the adjacency matrix of the residues. The weighted Laplacian operator we use estimates, at various scales, local deformations of the topology where each residue is located. On some benchmarks, which are widely shared by the community, we obtain qualitatively similar results compared to other competing approaches, but with an algorithm one or two order of magnitudes faster. 180,000 protein comparisons can be done within 1 second with a single recent Graphical Processing Unit (GPU), which makes our algorithm very scalable and suitable for real-time database querying across the web.
INDEX TERMS
proteins, biology computing, dynamic programming, graph theory, graphics processing units, molecular biophysics, molecular configurations, web, protein structural comparison, Laplacian characterization, tertiary structure, protein 3D shapes, Protein Data Bank, Laplacian norm alignment, dynamic programming algorithms, graph, adjacency matrix, weighted Laplacian operator, local deformations, topology, graphical processing unit, GPU, real-time database querying, Proteins, Laplace equations, Three dimensional displays, Dynamic programming, Heuristic algorithms, Graphics processing unit, Accuracy, GPU implementation., Proteins, structural comparison, Laplacian, classification
CITATION
Nicolas Bonnel, Pierre-Francois Marteau, "LNA: Fast Protein Structural Comparison Using a Laplacian Characterization of Tertiary Structure", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 5, pp. 1451-1458, Sept.-Oct. 2012, doi:10.1109/TCBB.2012.64
REFERENCES
[1] S.F. Altschul, T.L. Madden, A.A. Schffer, J. Zhang, Z. Zhang, W. Miller, and D.J. Lipman, "Gapped Blast and Psiblast: A New Generation of Protein Database Search Programs," Nucleic Acids Research, vol. 25, pp. 3389-3402, 1997.
[2] R. Andonov, N. Yanev, and N. Malod-Dognin, "An Efficient Lagrangian Relaxation for the Contact Map Overlap Problem," Proc. Eight Int'l Workshop Algorithms in Bioinformatics, pp. 162-173, 2008.
[3] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, and P.E. Bourne, "The Protein Data Bank," Nucleic Acids Research, vol. 28, pp. 235-242, 2000.
[4] A.P. Bradley, "The Use of the Area Under the Roc Curve in the Evaluation of Machine Learning Algorithms," Pattern Recognition, vol. 30, pp. 1145-1159, 1997.
[5] A. Caprara, R. Carr, S. Istrail, G. Lancia, and B. Walenz, "1001 Optimal PDB Structure Alignments: Integer Programming Methods for Finding the Maximum Contact Map Overlap," J. Computational Biology, vol. 11, pp. 27-52, 2004.
[6] M. Carpentier, S. Brouillet, and J. Pothier, "Yakusa: A Fast Structural Database Scanning Method," Proteins, vol. 61, pp. 137-151, 2005.
[7] J.M. Chandonia, G. Hon, N.S. Walker, L. Lo Conte, P. Koehl, M. Levitt, and S.E. Brenner, "The ASTRAL Compendium in 2004," Nucleic Acids Research, vol. 32, pp. D189-D192, 2004.
[8] A.L. Cuff, I. Sillitoe, T. Lewis, A.B. Clegg, R. Rentzsch, N. Furnham, M. Pellegrini-Calace, D. Jones, J. Thornton, and C.A. Orengo, "Extending CATH: Increasing Coverage of the Protein Structure Universe and Linking Structure with Function," Nucleic Acids Research, vol. 39, pp. D420-D426, 2011.
[9] W.L. Delano, "The PyMOL Molecular Graphics System," 2002.
[10] M. Desbrun, M. Meyer, P. Schröder, and A.H. Barr, "Discrete Differential-Geometry Operators for Triangulated 2-Manifolds," Proc. Int'l Workshop Visualization and Math. (VisMath '02), pp. 35-57, 2002.
[11] P. Di Lena, P. Fariselli, L. Margara, M. Vassura, and R. Casadio, "Fast Overlapping of Protein Contact Maps by Alignment of Eigenvectors," Bioinformatics, vol. 26, pp. 2250-2258, 2010.
[12] K. Frank, M. Gruber, and M.J. Sippl, "COPS Benchmark: Interactive Analysis of Database Search Methods," Bioinformatics, vol. 26, pp. 574-575, 2010.
[13] J.F. Gibrat, T. Madej, and S.H. Bryant, "Surprising Similarities in Structure Comparison," Current Opinion Structural Biology, vol. 6, pp. 377-385, 1996.
[14] D. Goldman, S. Istrail, and C. Papadimitriou, "Algorithmic Aspects of Protein Structure Similarity," Proc. 40th Ann. Symp. Foundations of Computer Science, pp. 512-521, 1999.
[15] L. Grady and J.R. Polimeni, Discrete Calculus: Applied Analysis on Graphs for Computational Science. Springer, 2010.
[16] L. Holm and C. Sander, "Protein Structure Comparison by Alignment of Distance Matrices," J. Molecular Biology, vol. 233, pp. 123-138, 1993.
[17] L. Holm and C. Sander, "Mapping the Protein Universe," Science, vol. 273, pp. 595-603, 1996.
[18] T.J. Hubbard, B. Ailey, S.E. Brenner, A.G. Murzin, and C. Chothia, "Scop: A Structural Classification of Proteins Database," Nucleic Acids Research, vol. 27, pp. 254-256, 1999.
[19] J. Jung and B. Lee, "Protein Structure Alignment Using Environmental Profiles," Protein Eng., vol. 13, pp. 535-543, 2000.
[20] W. Kabsch, "A Solution for the Best Rotation to Relate Two Sets of Vectors," Acta Crystallographica, vol. 32, pp. 922-923, 1976.
[21] Khro nos Group, "The OpenCL Specification," 2010.
[22] I. Kifer, R. Nussinov, and H.J. Wolfson, "Gossip: A Method for Fast and Accurate Global Alignment of Protein Structures," Bioinformatics, vol. 27, pp. 925-932, 2011.
[23] E. Krissinel and K. Henrick, "Secondary-Structure Matching (SSM), a New Tool for Fast Protein Structure Alignment in Three Dimensions," Acta Crystallographica, vol. 60, pp. 2256-2268, 2004.
[24] W. Lo, P. Huang, C. Chang, and P. Lyu, "Protein Structural Similarity Search by Ramachandran Codes," BMC Bioinformatics, vol. 8, article 307, 2007.
[25] W. Lo, C. Lee, C. Lee, and P. Lyu, "Isarst: An Integrated Sarst Web Server for Rapid Protein Structural Similarity Searches," Nucleic Acids Research, vol. 37, pp. 545-551, 2009.
[26] D. Mateus, R. Horaud, D. Knossow, F. Cuzzolin, and E. Boyer, "Articulated Shape Matching Using Laplacian Eigenfunctions and Unsupervised Point Registration," Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 1-8, 2008.
[27] S.B. Needleman and C.D. Wunsch, "A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins," J. Molecular Biology, vol. 48, pp. 443-453, 1970.
[28] C.A. Orengo and W.R. Taylor, "SSAP: Sequential Structure Alignment Program for Protein Structure Comparison," Methods Enzymology, vol. 266, pp. 617-635, 1996.
[29] A.R. Ortiz, C.E. Strauss, and O. Olmea, "MAMMOTH (Matching Molecular Models Obtained from Theory): An Automated Method for Model Comparison," Protein Science: A Publication of the Protein Soc., vol. 11, pp. 2606-2621, 2002.
[30] S.B. Pandit and J. Skolnick, "Fr-Tm-Align: A New Protein Structural Alignment Method Based on Fragment Alignments and the Tm-Score," BMC Bioinformatics, vol. 9, article 531, 2008.
[31] G.N. Ramachandran and V. Sasisekharan, "Conformation of Polypeptides and Proteins," Advances in Protein Chemistry, vol. 23, pp. 283-438, 1968.
[32] N.N. Schraudolph, "A Fast, Compact Approximation of the Exponential Function," Neural Computation, vol. 11, pp. 853-862, 1999.
[33] I.N. Shindyalov and P.E. Bourne, "Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path," Protein Eng., vol. 11, pp. 739-747, 1998.
[34] T. Sing, O. Sander, N. Beerenwinkel, and T. Lengauer, "ROCR: Visualizing Classifier Performance in R," Bioinformatics, vol. 21, pp. 3940-3941, 2005.
[35] T.F. Smith and M.S. Waterman, "Identification of Common Molecular Subsequences," J. Molecular Biology, vol. 147, pp. 195-197, 1981.
[36] S.J. Suhrer, M. Wiederstein, M. Gruber, and M.J. Sippl, "COPS: A Novel Workbench for Explorations in Fold Space," Nucleic Acids Research, vol. 37, pp. W539-W544, 2009.
[37] J. Yang and C. Tung, "Protein Structure Database Search and Evolutionary Classification," Nucleic Acids Research, vol. 34, pp. 3646-3659, 2006.
[38] Y. Ye and A. Godzik, "Flexible Structure Alignment by Chaining Aligned Fragment Pairs Allowing Twists," Bioinformatics, vol. 19, no. Suppl2, pp. ii246-ii255, 2003.
[39] Y. Zhang and J. Skolnick, "Tm-Align: A Protein Structure Alignment Algorithm Based on the Tm-Score," Nucleic Acids Research, vol. 33, pp. 2302-2309, 2005.
[40] J. Zhu and Z. Weng, "Fast: A Novel Protein Structure Alignment Algorithm," Proteins, vol. 58, pp. 618-627, 2005.
25 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool