This Article 
 Bibliographic References 
 Add to: 
Efficient Approaches for Retrieving Protein Tertiary Structures
July-Aug. 2012 (vol. 9 no. 4)
pp. 1166-1179
Z. Dimov, Microsoft Corp., Seattle, WA, USA
I. Cingovska, Dept. of Comput. Sci. & Comput. Eng., Ss. Cyril & Methodius Univ. in Skopje, Macedonia, Skopje, Macedonia
G. Mirceva, Dept. of Comput. Sci. & Comput. Eng., Ss. Cyril & Methodius Univ. in Skopje, Macedonia, Skopje, Macedonia
D. Davcev, Dept. of Comput. Sci. & Comput. Eng., Ss. Cyril & Methodius Univ. in Skopje, Macedonia, Skopje, Macedonia
The 3D conformation of a protein in the space is the main factor which determines its function in living organisms. Due to the huge amount of newly discovered proteins, there is a need for fast and accurate computational methods for retrieving protein structures. Their purpose is to speed up the process of understanding the structure-to-function relationship which is crucial in the development of new drugs. There are many algorithms addressing the problem of protein structure retrieval. In this paper, we present several novel approaches for retrieving protein tertiary structures. We present our voxel-based descriptor. Then we present our protein ray-based descriptors which are applied on the interpolated protein backbone. We introduce five novel wavelet descriptors which perform wavelet transforms on the protein distance matrix. We also propose an efficient algorithm for distance matrix alignment named Matrix Alignment by Sequence Alignment within Sliding Window (MASASW), which has shown as much faster than DALI, CE, and MatAlign. We compared our approaches between themselves and with several existing algorithms, and they generally prove to be fast and accurate. MASASW achieves the highest accuracy. The ray and wavelet-based descriptors as well as MASASW are more accurate than CE.

[1] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, and P.E. Bourne, "The Protein Data Bank," Nucleic Acids Research, vol. 28, no. 1, pp. 235-242, Jan. 2000.
[2] RCSB Protein Data Bank, http:/, 2009.
[3] J.F. Gibrat, T. Madej, and S.H. Bryant, "Surprising Similarities in Structure Comparison," Current Opinion in Structural Biology, vol. 6, no. 3, pp. 377-385, June 1996.
[4] N.N. Alexandrov, "SARFing the PDB," Protein Eng., vol. 9, no. 9, pp. 727-732, Sept. 1996.
[5] D. Madsen and G.J. Kleywegt, "Interactive Motif and Fold Recognition in Protein Structures," J. Applied Crystallography, vol. 35, pp. 137-139, Feb. 2002.
[6] A.P. Singh and D.L. Brutlag, "Hierarchical Protein Structure Superposition Using both Secondary Structure and Atomic Representations," Proc. Fifth Int'l Conf. Intelligent Systems for Molecular Biology (ISMB), pp. 284-293, June 1997.
[7] C.H. Chionh, Z. Huang, K.L. Tan, and Z. Yao, "Augmenting Sses with Structural Properties for Rapid Protein Structure Comparison," Proc. IEEE Third Symp. Bioinformatics and Bioeng. (BIBE), pp. 341-348, Mar. 2003.
[8] T. Kawabata and K. Nishikawa, "Protein Structure Comparison Using the Markov Transition Model of Evolution," Proteins, vol. 41, no. 1, pp. 108-122, Oct. 2000.
[9] A. Harrison, F. Pearl, R. Mott, J. Thornton, and C. Orengo, "Quantifying the Similarities within Fold Space," J. Molecular Biology, vol. 323, no. 5, pp. 909-926, Nov. 2002.
[10] E. Krissinel and K. Henrick, "Secondary-Structure Matching (SSM), A New Tool for Fast Protein Structure Alignment in Three Dimensions," Acta Crystallographica Section D Biological Crystallography, vol. 60, no. 1, pp. 2256-2268, Dec. 2004.
[11] A.C. Martin, "The Ups and Downs of Protein Topology; Rapid Comparison of Protein Structure," Protein Eng., vol. 13, no. 12, pp. 829-837, Dec. 2000.
[12] O. Camogla, T. Kahveci, and A. Singh, "PSI: Indexing Protein Structures for Fast Similarity Search," Bioinformatics, vol. 19, Suppl. 1, pp. i81-i83, 2003.
[13] L. Holm and C. Sander, "Protein Structure Comparison by Alignment of Distance Matrices," J. Molecular Biology, vol. 233, no. 1, pp. 123-138, Sept. 1993.
[14] H.N. Shindyalov and P.E. Bourne, "Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path," Protein Eng., vol. 11, no. 9, pp. 739-747, Sept. 1998.
[15] W.R. Taylor and C.A. Orengo, "Protein Structure Alignment," J. Molecular Biology, vol. 208, no. 1, pp. 1-22, July 1989.
[16] Y. Ye and A. Godzik, "FATCAT: A Web Server for Flexible Structure Comparison and Structure Similarity Searching," Nucleic Acids Research, vol. 32, Web Server issue, pp. W582-W585, July 2004.
[17] A.I. Jewett, C.C. Huang, and T.E. Ferrin, "MINRMS: An Efficient Algorithm for Determining Protein Structure Similarity Using Root-Mean-Squared-Distance," Bioinformatics, vol. 19, no. 5, pp. 625-634, Mar. 2003.
[18] D.A. Pelta, J.R. González, and M.M. Vega, "A Simple and Fast Heuristic for Protein Structure Comparison," BMC Bioinformatics, vol. 9, article 161, Mar. 2008.
[19] M. Tyagi, P. Sharma, C.S. Swamy, F. Cadet, N. Srinivasan, A.G. de Brevern, and B. Offmann, "Protein Block Expert (PBE): A Web-Based Protein Structure Analysis Server Using a Structural Alphabet," Nucleic Acids Research, vol. 34, pp. W119-123, July 2006.
[20] J.M. Yang and C.H. Tung, "Protein Structure Database Search and Evolutionary Classification," Nucleic Acids Research, vol. 34, no. 13, pp. 3646-3659, Aug. 2006.
[21] L.P. Chew, D. Huttenlocher, K. Kedem, and J. Kleinberg, "Fast Detection of Common Geometric Substructure in Proteins," J. Computational Biology, vol. 6, no. 3/4, pp. 313-325, Oct. 1999.
[22] A.R. Ortiz, C.E. Strauss, and O. Olmea, "MAMMOTH (Matching Molecular Models Obtained from Theory): An Automated Method for Model Comparison," Protein Science, vol. 11, no. 11, pp. 2606-2621, Nov. 2002.
[23] J. Huan, W. Wang, A. Washington, J. Prins, R. Shah, and A. Tropsha, "Accurate Classification of Protein Structural Families Using Coherent Subgraph Analysis," Proc. Pacific Symp. Biocomputing (PSB), pp. 411-422, Jan. 2004.
[24] A. Zemla, "LGA: A Method for Finding 3D Similarities in Protein Structures," Nucleic Acids Research, vol. 31, no. 13, pp. 3370-3374, July 2003.
[25] V.A. Ilyin, A. Abyzov, and C.M. Leslin, "Structural Alignment of Proteins by a Novel TOPOFIT Method, as a Superimposition of Common Volumes at a Topomax Point," Protein Science, vol. 13, no. 7, pp. 1865-1874, July 2004.
[26] N. Leibowitz, Z.Y. Fligelman, R. Nussinov, and H.J. Wolfson, "Automated Multiple Structure Alignment and Detection of a Common Substructure Motif," Proteins, vol. 43, no. 3, pp. 235-245, May 2001.
[27] M. Shatsky, R. Nussinov, and H.J. Wolfson, "A Method for Simultaneous Alignment of Multiple Protein Structures," Proteins: Structure, Function, and Bioinformatics, vol. 56, no. 1, pp. 143-156, July 2004.
[28] T. Can and Y.F. Wang, "CTSS: A Robust and Efficient Method for Protein Structure Alignment Based on Local Geometrical and Biological Features," Proc. IEEE CS Conf. Bioinformatics (CSB), pp. 169-179, Aug. 2003.
[29] A. Bhattacharya, T. Can, T. Kahveci, A.K. Singh, and Y.F. Wang, "ProGreSS: Simultaneous Searching of Protein Databases by Sequence and Structure," Proc. Pacific Symp. Biocomputting, pp. 264-275, Jan. 2004.
[30] S. Kalajdziski, G. Mirceva, K. Trivodaliev, and D. Davcev, "Protein Classification by Matching 3D Structures," Proc. Frontiers in the Convergence of Bioscience and Information Technologies (FBIT), pp. 147-152, Oct. 2007.
[31] G. Mirceva, S. Kalajdziski, K. Trivodaliev, and D. Davcev, "Comparative Analysis of Three Efficient Approaches for Retrieving Protein 3D Structures," Proc. Cairo Int'l Biomedical Eng. Conf. (CIBEC), pp. 1-4, Dec. 2008.
[32] C. Cui, D. Wang, and J. Shi, "Comparing 3-D Protein Structures Similarity by Using Fractal Features," Proc. IEEE Computational Systems Bioinformatics Conf. (CSB '04), pp. 698-699, Aug. 2004.
[33] P. Rogen and B. Fain, "Automatic Classification of Protein Structure by Using Gauss Integrals," Proc. Nat'l Academy of Sciences USA, vol. 100, no. 1, pp. 119-124, Jan. 2003.
[34] X. Zhou, J. Chou, and S.T. Wong, "Protein Structure Similarity from Principle Component Correlation Analysis," BMC Bioinformatics, vol. 7, no. 40, Jan. 2006, doi:10.1186/1471-2105-7-40.
[35] K. Marsolo and S. Parthasarathy, "Alternate Representation of Distance Matrices for Characterization of Protein Structure," Proc. IEEE Fifth Int'l Conf. Data Mining (ICDM), pp. 298-305, Nov. 2005.
[36] K. Marsolo, P. Srinivasan, and K. Ramamohanarao, "Structure-Based Querying of Proteins Using Wavelets," Proc. ACM 15th Conf. Information and Knowledge Management (CIKM), pp. 24-33, Nov. 2006.
[37] P.-H. Chi, G. Scott, and C.-R. Shyu, "A Fast Protein Structure Retrieval System Using Image-Based Distance Matrices and Multidimensional Index," Proc. IEEE Fourth Symp. Bioinformatics and Bioeng. (BIBE '04), pp. 522-532, Mar. 2004.
[38] D.V. Vranic, "3D Model Retrieval," PhD dissertation, Dept. of Computer Science, Univ. of Leipzig, Leipzig, 2004.
[39] D. Plewczynski, J. Pas, M. von Grotthuss, and L. Rychlewski, "3D-HIT: Fast Structural Comparison of Proteins," Applied Bioinformatics, vol. 1, no. 4, pp. 223-225, 2002.
[40] L. Chen, R. Oughtred, H.M. Berman, and J. Westbrook, "TargetDB: A Target Registration Database for Structural Genomics Projects," Bioinformatics, vol. 20, no. 16, pp. 2860-2862, May 2004.
[41] Z. Aung and K.L. Tan, "Rapid 3D Protein Structure Database Searching Using Information Retrieval Techniques," Bioinformatics, vol. 20, no. 7, pp. 1045-1052, Feb. 2004.
[42] P. Daras, D. Zarpalas, A. Axenopoulos, D. Tzovaras, and M.G. Strintzis, "Three-Dimensional Shape-Structure Comparison Method for Protein Classification," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 3, no. 3, pp. 193-207, July/Sept. 2006.
[43] Z. Aung and K.-L. Tan, "MatAlign: Precise Protein Structure Comparison by Matrix Alignment," J. Bioinformatics and Computational Biology, vol. 4, no. 6, pp. 1197-1216, Dec. 2006.
[44] H. Junichi and U. Hideaki, "Protein Dynamics Determined by Backbone Conformation and Atom Packing," Protein Eng., vol. 10, no. 4, pp. 373-380, Apr. 1997.
[45] P. Berman, P. Bertone, B. Dasgupta, M. Gerstein, M.-Y. Kao, and M. Snyder, "Fast Optimal Genome Tiling with Applications to Microarray Design and Homology Search," J. Computational Biology, vol. 11, no. 4, pp. 766-785, July 2004.
[46] R.C. Gonzales and R.E. Woods, Digital Image Processing, second ed. Prentice Hall, pp. 349-404, 2002.
[47] A.G. Murzin, S.E. Brenner, T. Hubbard, and C. Chothia, "Scop: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures," J. Molecular Biology, vol. 247, no. 4, pp. 536-540, Apr. 1995.
[48] SCOP Database,, 2009.
[49] J.-M. Chandonia, G. Hon, N.S. Walker, L. Lo Conte, P. Koehl, M. Levitt, and S.E. Brenner, "The ASTRAL Compendium in 2004," Nucleic Acids Research, vol. 32, database issue, pp. D189-D192, Jan. 2004.
[50] DALI Database, , 2010.
[51] CE website, http://cl.sdsc.educe.html, 2010.

Index Terms:
wavelet transforms,bioinformatics,biological techniques,matrix algebra,molecular biophysics,molecular configurations,proteins,MASASW,protein tertiary structure retrieval,protein 3D conformations,protein function,computational methods,structure-function relationship,drug development,voxel based descriptor,protein ray based descriptors,interpolated protein backbone,wavelet descriptors,wavelet transforms,protein distance matrix,distance matrix alignment,Matrix Alignment by Sequence Alignment within Sliding Window,Proteins,Feature extraction,Three dimensional displays,Amino acids,Interpolation,Wavelet transforms,Matrix decomposition,feature extraction.,Information search and retrieval,protein databases
Z. Dimov, I. Cingovska, G. Mirceva, D. Davcev, "Efficient Approaches for Retrieving Protein Tertiary Structures," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1166-1179, July-Aug. 2012, doi:10.1109/TCBB.2011.138
Usage of this product signifies your acceptance of the Terms of Use.