This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Three-Dimensional Shape-Structure Comparison Method for Protein Classification
July-September 2006 (vol. 3 no. 3)
pp. 193-207
In this paper, a 3D shape-based approach is presented for the efficient search, retrieval, and classification of protein molecules. The method relies primarily on the geometric 3D structure of the proteins, which is produced from the corresponding PDB files and secondarily on their primary and secondary structure. After proper positioning of the 3D structures, in terms of translation and scaling, the Spherical Trace Transform is applied to them so as to produce geometry-based descriptor vectors, which are completely rotation invariant and perfectly describe their 3D shape. Additionally, characteristic attributes of the primary and secondary structure of the protein molecules are extracted, forming attribute-based descriptor vectors. The descriptor vectors are weighted and an integrated descriptor vector is produced. Three classification methods are tested. A part of the FSSP/DALI database, which provides a structural classification of the proteins, is used as the ground truth in order to evaluate the classification accuracy of the proposed method. The experimental results show that the proposed method achieves more than 99 percent classification accuracy while remaining much simpler and faster than the DALI method.

[1] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, and P.E. Bourne, “The Protein Data Bank,” Nucleic Acids Research, vol. 28, pp. 235-242, 2000.
[2] J.L. Sussman, D. Ling, J. Jiang, N.O. Manning, J. Prilusky, O. Ritter, and E.E. Abola, “Acta Crystallogr.,” vol. 54, pp. 1078-1084, 1998.
[3] C.B. Anfinsen, “Principles that Govern the Folding of Protein Chains,” Science, vol. 181, pp. 223-230, 1973.
[4] A.G. Murzin, S.E. Brenner, T. Hubbard, and C. Chothia, “Scop: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures,” J. Molecular Biology, vol. 247, pp. 536-540, 1995.
[5] C.A Orengo, A.D. Michie, D.T. Jones, M.B. Swindells, and J.M. Thornton, “CATH— A Hierarchic Classification of Protein Domain Structures,” Structure, vol. 5, no. 8, pp. 1093-1108, 1997.
[6] L. Holm and C. Sander, “The FSSP Database: Fold Classification Based on Structure-Structure Alignment of Proteins,” Nucleic Acids Research, vol. 24, pp. 206-210, 1996.
[7] L. Holm and C. Sander, “Touring Protein Fold Space with Dali/FSSP,” Nucleic Acids Research, vol. 26, pp. 316-319, 1998.
[8] The European Bionformatics Institute, http:/www.ebi.ac.uk/, 2006.
[9] A. Bairoch and R. Apweiler, “The SWISS-PROT Protein Sequence Databank and Its Supplement TrEMBL in 1998,” Nucleid Acids Research, vol. 26, pp. 38-42, 1998.
[10] L. Falquet, M. Pagni, P. Bucher, N. Hulo, C.J. Sigrist, K. Hofmann, A. Bairoch, “The PROSITE Database, Its Status in 2002,” Nucleid Acids Research, vol. 30, pp. 235-238, 2002.
[11] http://www.expasy.chprosite/, 2006.
[12] http:/www.rcsb.org, 2006.
[13] F. Psomopoulos, S. Diplaris, P.A. Mitkas, “A Finite State Automata Based Technique for Protein Classification Rules Induction,” Proc. Second European Workshop Data Mining and Text Mining in Bioinformatics, 2004.
[14] W.N. Grundy, T.L. Bailey, C.P. Elkan, and M.E. Baker, “Meta-MEME: Motif-Based Hidden Markov Models of Protein Families,” IEEE Trans. Computational and Applied Bioscience, vol. 13, no. 4, pp. 397-406, Aug. 1997.
[15] M. Ankerst, G. Kastenmuller, H.P. Kriegel, and T. Seidl, “Nearest Neigbor Classification in 3D Protein Databases,” Proc. Seventh Int'l Conf. Intelligent Systems for Molecular Biology (ISMB '99), 1999.
[16] C. Zhang and T. Chen, “Retrieval of 3D Protein Structures,” Proc. Int'l Conf. Image Processing, Sept. 2002.
[17] C. Zhang and T. Chen, “Efficient Feature Extraction for 2D/3D Objects in Mesh Representation,” Proc. Int'l Conf. Image Processing, vol. 3, pp. 935-938, Oct. 2001.
[18] C. Guerra, S. Lonardi, and G. Zanotti, “Analysis of Secondary Structure Elements of Proteins Using Indexing Techniques,” Proc. First Int'l Symp. 3D Data Processing Visualization and Transmission (3DPVT '02), 2002.
[19] D. Zarpalas, P. Daras, D. Tzovaras, and M.G. Strintzis, “3D Model Search and Retrieval Using the Spherical Trace Transform,” IEEE Trans. Multimedia, submitted.
[20] P.T. Yap, R. Paramesran, and S.H. Ong, “Image Analysis by Krawtchouk Moments,” IEEE Trans. Image Processing, vol. 12, no. 11, pp. 1367-1377, Nov. 2003.
[21] M.K. Hu, “Visual Pattern Recognition by Moment Invariants,” IRE Trans. Information Theory, vol. 8, pp. 179-197, 1962.
[22] D.W. Ritchie, “Parametric Protein Shale Recognition,” PhD thesis, Univ. of Aberdeen, 1998.
[23] http://www.bioinfo.biocenter.helsinki.fi:8080/ daliindex.html, 2006.
[24] P. Koehl, “Protein Structure Similarities,” Current Opinion in Structural Biology, vol. 11, no. 3, pp. 348-353, June 2001.
[25] I.-G. Choi, J. Kwon, and S.-H. Kim, “Local Feature Frequency Profile: A Method to Measure Structural Similarity in Proteins,” Proc. Nat'l Academy of Science, vol. 101, no. 11, pp. 3797-3802, Mar. 2004.
[26] S. Cheek, Y. Qi, S. SriKrishna, L.N. Kinch, and N.V. Grishin, “SCOPmap: Automated Assignment of Protein Structures to Evolutionary Superfamilies,” BMC Bioinformatics, vol. 5, p. 197, 2004.
[27] J. Huan, W. Wang, A. Washington, J. Prins, R. Shah, and A. Tropsha, “Accurate Classification of Protein Structural Families Using Coherent Subgraph Analysis,” Proc. Pacific Symp. Biocomputing (PSB), 2004.
[28] A. Dubey, S. Hwang, C. Rangel, C.E. Rasmussen, Z. Ghahramani, and D.L. Wild, “Clustering Protein Sequence and Structure Space with Infinite Gaussian Mixture Models,” Proc. Pacific Symp. Biocomputing, 2004.
[29] L. Holm and C. Sander, “3-D Lookup: Fast Protein Structure Database Searches at 90% Reliability,” Proc. Third Int'l Conf. Intelligent Systems for Molecular Biology (ISMB), pp. 179-187, 1995.
[30] S. Dua and N. Kandiraju, “A Novel Computational Framework for Structural Classification of Proteins Using Local Geometric Parameter Matching,” Proc. 2004 IEEE Computational Systems Bioinformatics Conf. (CSB 2004), pp. 710-711, 2004.
[31] Y. Sun, M. Robinson, R. Adams, A.G. Rust, P. Kaye, and N. Davey, “Integrating Binding Site Predictions Using Meta Classification Methods,” Proc. Seventh Int'l Conf. Adaptive and Natural Computing Algorithms (ICANNGA 2005), Mar. 2005.
[32] S. Tiwari and S. Gallager, “Machine Learning and Multiscale Methods in the Identification of Bivalve Larvae,” Proc. Ninth IEEE Int'l Conf. Computer Vision (ICCV 2003), pp. 13-16, Oct. 2003.
[33] P. Daras, D. Zarpalas, D. Tzovaras, and M.G. Strintzis, “3D Model Search and Retrieval Based on the Spherical Trace Transform,” Proc. IEEE Int'l Workshop Multimedia Signal Processing (MMSP), 2004.
[34] S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D.J. Lipman, “Gapped Blast and PSI-Blast: A New Generation of Protein Database Search Programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389-3402, 1997.

Index Terms:
Information search and retrieval, classification, protein databases.
Citation:
Petros Daras, Dimitrios Zarpalas, Apostolos Axenopoulos, Dimitrios Tzovaras, Michael Gerassimos Strintzis, "Three-Dimensional Shape-Structure Comparison Method for Protein Classification," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 3, no. 3, pp. 193-207, July-Sept. 2006, doi:10.1109/TCBB.2006.43
Usage of this product signifies your acceptance of the Terms of Use.