The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - March/April (2012 vol.9)
pp: 601-608
U. B. Angadi , Nat. Inst. of Animal Nutrition & Physiol., Kalasalingam Univ., Bangalore, India
M. Venkatesulu , Dept. of Comput. Applic., Kalasalingam Univ., Srivilliputtur, India
ABSTRACT
One of the major research directions in bioinformatics is that of assigning superfamily classification to a given set of proteins. The classification reflects the structural, evolutionary, and functional relatedness. These relationships are embodied in a hierarchical classification, such as the Structural Classification of Protein (SCOP), which is mostly manually curated. Such a classification is essential for the structural and functional analyses of proteins. Yet a large number of proteins remain unclassified. In this study, we have proposed an unsupervised machine learning approach to classify and assign a given set of proteins to SCOP superfamilies. In the method, we have constructed a database and similarity matrix using P-values obtained from an all-against-all BLAST run and trained the network with the ART2 unsupervised learning algorithm using the rows of the similarity matrix as input vectors, enabling the trained network to classify the proteins from 0.82 to 0.97 f-measure accuracy. The performance of ART2 has been compared with that of spectral clustering, Random forest, SVM, and HHpred. ART2 performs better than the others except HHpred. HHpred performs better than ART2 and the sum of errors is smaller than that of the other methods evaluated.
INDEX TERMS
Proteins, Databases, Training, Support vector machines, Matrices, Hidden Markov models, Bioinformatics,unsupervised learning., Protein classification, SCOP, ART2 neural network
CITATION
U. B. Angadi, M. Venkatesulu, "Structural SCOP Superfamily Level Classification Using Unsupervised Machine Learning", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 2, pp. 601-608, March/April 2012, doi:10.1109/TCBB.2011.114
REFERENCES
[1] A.G. Murzin, S.E. Brenner, T. Hubbard, and C. Chothia, “SCOP: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures,” J. Molecular Biology, vol. 247, no. 4, pp. 536-540, 1995.
[2] C.A. Orengo, A.D. Michie, S. Jones, D.T. Jones, M.B. Swindells, and J.M. Thornton, “CATH—A Hierarchic Classification of Protein Domain Structures,” Structure, vol. 5, no. 8, pp. 1093-1108, 1997.
[3] L. Holm and C. Sander, “Dali/FSSP Classification of Three-Dimensional Protein Folds,” Nucleic Acid Research, vol. 25, no. 1, pp. 231-234, 1997.
[4] P. Jain, J.M. Garibaldi, and J.D. Hirst, “Supervised Machine Learning Algorithms for Protein Structure Classification,” Computational Biology and Chemistry, vol. 33, no. 3, pp. 216-223, 2009.
[5] J. Gough and C. Chothia, “SUPERFAMILY: HMMs Representing All Proteins of Known Structure, SCOP Sequence Searches, Alignments and Genome Assignments,” Nucleic Acids Research, vol. 30, no. 1, pp. 268-272, 2002.
[6] S. Cheek, Y. Qi, S.S. Krishna, L.N. Kinch, and N.V. Grishin, “SCOPmap: Automated Assignment of Protein Structures to Evolutionary Superfamilies,” BMC Bioinformatics, vol. 5, article 197, doi: 10.1186/1471-2105-5-197, 2004.
[7] O. Camoglu, T. Can, A.K. Singh, and Y.F. Wang, “Decision Tree Based Information Integration for Automated Protein Classification,” J. Bioinformatics and Computational Biology, vol. 3, no. 3, pp. 717-742, 2005.
[8] A. Paccanaro, J.A. Casbon, and M.A. Saqi, “Spectral Clustering of Protein Sequences,” Nucleic Acids Research, vol. 34, no. 5, pp. 1571-1580, 2006.
[9] J.E. Gewehr, V. Hintermair, and R. Zimmer, “AutoSCOP: Automated Prediction of SCOP Classifications Using Unique Pattern_Class Mappings,” Bioinformatics, vol. 23, no. 10, pp. 1203-1210, 2007.
[10] L. Liao and W.S. Noble, “Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationship,” J. Computational Biology, vol. 10, no. 6, pp. 857-868, 2003.
[11] Y.J. Kim and J.M. Patel, “A Framework for Protein Structure Classification and Identification of Novel Protein Structures,” BMC Bioinformatics, vol. 7, article 456, doi:10.1.186/147-2105-7-456, 2006.
[12] G. Csaba, F. Birzele, and R. Zimmer, “Systematic Comparison of SCOP and CATH: A New Gold Standard for Protein Structure Analysis,” BMC Structural Biology, vol. 9, article 23, doi:10.1186/1472-6807-9-23, 2009.
[13] J. Soding, A. Biegert, and A.N. Lupas, “The HHpred Interactive Server for Protein Homology Detection and Structure Prediction,” Nucleic Acids Research, vol. 33, pp. W244-W248, 2005.
[14] J. Soding, “Protein Homology Detection by HMM-HMM Comparison,” Bioinformatics, vol. 21, no. 7, pp. 951-960, 2005.
[15] O. Sasson, A. Vaaknin, H. Fleischer, E. Portugaly, Y. Bilu, N. Linial, and M. Linial, “ProtoNet: Hierarchical Classification of the Protein Space,” Nucleic Acid Research, vol. 31, no. 1, pp. 348-352, 2003.
[16] O. Shachar and M. Linial, “A Robust Method to Detect Structural and Functional Remote Homologues,” Proteins, vol. 57, no. 3, pp. 531-538, 2004.
[17] A. Krause, J. Stoye, and M. Vingron, “The SYSTERS Protein Sequence Cluster Set,” Nucleic Acid Research, vol. 28, no. 1, pp. 270-272, 2000.
[18] A. Heger and L. Holm, “Picasso: Generating a Covering Set of Protein Family Profiles,” Bioinformatics, vol. 17, no. 3, pp. 272-279, 2001.
[19] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, pp. 14-16, Wiley, 2007.
[20] P. Baldi and S. Brunak, Bioinformatics: The Machine Learning Approach, second ed., pp. 20-45, MIT Press, 2001.
[21] G.A. Carpenter and S. Grossberg, “ART 2: Self-Organization of Stable Category Recognition Codes for Analog Input Patterns,” Applied Optics, vol. 26, no. 23, pp. 4919-4930, 1987.
[22] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman, “Basic Local Alignment Search Tool,” J. Molecular Biology, vol. 215, no. 3, pp. 403-410, 1990.
[23] L. Fausett, Fundamentals of Neural Networks, pp. 264-300, Pearson Education, 2006.
[24] J. Chandonia, G. Hon, N.S. Walker, L. Lo Conte, P. Koehl, M. Levitt, and S.E. Brenner, “The ASTRAL Compendium in 2004,” Nucleic Acids Research, vol. 32, pp. D189-D192, 2004.
[25] M.O. Dayhoff, R.M. Schwartz, and B.C. Orcutt, “A Model of Evolutionary Change in Proteins,” Atlas of Protein Sequence and Structure, vol. 5, no. suppl. 3, pp. 345-351, 1978.
[26] S. Henikoff and J.G. Henikoff, “Amino Acid Substitution Matrices from Protein Blocks,” Proc. Nat'l Academy of Sciences USA, vol. 89, no. 22, pp. 10915-10919, 1992.
[27] Y. Yang and X. Liu, “A Re-Examination of Text Categorization Methods,” Proc. ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 42-49, 1999.
[28] C.J. van Rijsbergen, Information Retrieval, second ed., pp. 112-135, Butterworths, 1979.
[29] G. Yona, N. Linial, and M. Linial, “Protomap: Automatic Classification of Protein Sequences and Hierarchy of Protein Families,” Nucleic Acid Research, vol. 28, no. 1, pp. 49-55, 2000.
[30] C. Tung and J. Yang, “Fastscop: A Fast Web Server for Recognizing Protein Structural Domains and SCOP Superfamilies,” Nucleic Acids Research, vol. 35, pp. W438-W443, doi:10.1093/nar/gkm288, 2007.
[31] E. Bolten, A. Schliep, S. Schneckener, D. Schomburg, and R. Schrader, “Clustering Protein Sequences-Structure Prediction by Transitive Homology,” Bioinformatics, vol. 17, no. 10, pp. 935-941, 2001.
[32] J. Liu and B. Rost, “Domains, Motifs and Clusters in the Protein Universe,” Current Opinion in Chemical Biology, vol. 7, pp. 5-11, 2003.
44 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool