Issue No. 02 - March/April (2012 vol. 9)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2011.114
U. B. Angadi , Nat. Inst. of Animal Nutrition & Physiol., Kalasalingam Univ., Bangalore, India
M. Venkatesulu , Dept. of Comput. Applic., Kalasalingam Univ., Srivilliputtur, India
One of the major research directions in bioinformatics is that of assigning superfamily classification to a given set of proteins. The classification reflects the structural, evolutionary, and functional relatedness. These relationships are embodied in a hierarchical classification, such as the Structural Classification of Protein (SCOP), which is mostly manually curated. Such a classification is essential for the structural and functional analyses of proteins. Yet a large number of proteins remain unclassified. In this study, we have proposed an unsupervised machine learning approach to classify and assign a given set of proteins to SCOP superfamilies. In the method, we have constructed a database and similarity matrix using P-values obtained from an all-against-all BLAST run and trained the network with the ART2 unsupervised learning algorithm using the rows of the similarity matrix as input vectors, enabling the trained network to classify the proteins from 0.82 to 0.97 f-measure accuracy. The performance of ART2 has been compared with that of spectral clustering, Random forest, SVM, and HHpred. ART2 performs better than the others except HHpred. HHpred performs better than ART2 and the sum of errors is smaller than that of the other methods evaluated.
Proteins, Databases, Training, Support vector machines, Matrices, Hidden Markov models, Bioinformatics
U. B. Angadi and M. Venkatesulu, "Structural SCOP Superfamily Level Classification Using Unsupervised Machine Learning," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 2, pp. 601-608, 2012.