Issue No. 03 - July-September (2008 vol. 5)
The subcellular locations of proteins are important functional annotations. An effective and reliable subcellular localization method is necessary for proteomics research. This paper introduces a new method---PairProSVM---to automatically predict the subcellular locations of proteins. The profiles of all protein sequences in the training set are constructed by PSI-BLAST and the pairwise profile-alignment scores are used to form feature vectors for training a support vector machine (SVM) classifier. It was found that PairProSVM outperforms the methods that are based on sequence alignment and amino-acid compositions even if most of the homologous sequences have been removed. This paper also demonstrates that the performance of PairProSVM is sensitive (and somewhat proportional) to the degree of its kernel matrix meeting the Mercer's condition. PairProSVM was evaluated on Reinhardt and Hubbard's, Huang and Li's, and Gardy et al.'s protein datasets. The overall accuracies on these three datasets reach 99.3\\%, 76.5\\%, and 91.9\\%, respectively, which are higher than or comparable to those obtained by sequence alignment and by the methods compared in this paper.
Subcellular localization, profile alignment, Kernel Methods, Support Vector Machines, Mercer condition
M. Mak, S. Kung and J. Guo, "PairProSVM: Protein Subcellular Localization Based on Local Pairwise Profile Alignment and SVM," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 5, no. , pp. 416-422, 2007.