The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - July-September (2008 vol.5)
pp: 416-422
ABSTRACT
The subcellular locations of proteins are important functional annotations. An effective and reliable subcellular localization method is necessary for proteomics research. This paper introduces a new method---PairProSVM---to automatically predict the subcellular locations of proteins. The profiles of all protein sequences in the training set are constructed by PSI-BLAST and the pairwise profile-alignment scores are used to form feature vectors for training a support vector machine (SVM) classifier. It was found that PairProSVM outperforms the methods that are based on sequence alignment and amino-acid compositions even if most of the homologous sequences have been removed. This paper also demonstrates that the performance of PairProSVM is sensitive (and somewhat proportional) to the degree of its kernel matrix meeting the Mercer's condition. PairProSVM was evaluated on Reinhardt and Hubbard's, Huang and Li's, and Gardy et al.'s protein datasets. The overall accuracies on these three datasets reach 99.3\\%, 76.5\\%, and 91.9\\%, respectively, which are higher than or comparable to those obtained by sequence alignment and by the methods compared in this paper.
INDEX TERMS
Subcellular localization, profile alignment, Kernel Methods, Support Vector Machines, Mercer condition
CITATION
Man-Wai Mak, Jian Guo, Sun-Yuan Kung, "PairProSVM: Protein Subcellular Localization Based on Local Pairwise Profile Alignment and SVM", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.5, no. 3, pp. 416-422, July-September 2008, doi:10.1109/TCBB.2007.70256
REFERENCES
[1] K. Nakai, “Protein Sorting Signals and Prediction of Subcellular Localization,” Advances in Protein Chemistry, vol. 54, no. 1, pp. 277-344, 2000.
[2] K. Nakai and M. Kanehisa, “Expert System for Predicting Protein Localization Sites in Gram-Negative Bacteria,” Proteins: Structure, Function, and Genetics, vol. 11, no. 2, pp. 95-110, 1991.
[3] K. Nakai and M. Kanehisa, “A Knowledge Base for Predicting Protein Localization Sites in Eukaryotic Cells,” Genomics, vol. 14, pp. 897-911, 1992.
[4] O. Emanuelsson, H. Nielsen, S. Brunak, and G. von Heijne, “Predicting Subcellular Localization of Proteins Based on Their N-Terminal Amino Acid Sequence,” J. Molecular Biology, vol. 300, pp.1005-1016, 1997.
[5] H. Nielsen, J. Engelbrecht, S. Brunak, and G. von Heijne, “A Neural Network Method for Identification of Prokaryotic and Eukaryotic Signal Peptides and Prediction of Their Cleavage Sites,” Int'l J. Neural Systems, vol. 8, pp. 581-599, 1997.
[6] H. Nielsen, S. Brunak, and G. von Heijne, “Machine Learning Approaches for the Prediction of Signal Peptides and Other Protein Sorting Signals,” Protein Eng., vol. 12, no. 1, pp. 3-9, 1999.
[7] P. Horton, K.J. Park, T. Obayashi, and K. Nakai, “Protein Subcellular Localization Prediction with WoLF PSORT,” Proc. Fourth Ann. Asia Pacific Bioinformatics Conf. (APBC '06), pp. 39-48, 2006.
[8] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman, “Basic Local Alignment Search Tool,” J. Molecular Biology, vol. 215, pp. 403-410, 1990.
[9] H. Nakashima and K. Nishikawa, “Discrimination of Intracellular and Extracellular Proteins Using Amino Acid Composition and Residue-Pair Frequencies,” J. Molecular Biology, vol. 238, pp. 54-61, 1994.
[10] J. Cedano, P. Aloy, J.A. Perez-Pons, and E. Querol, “Relation between Amino Acid Composition and Cellular Location of Proteins,” J. Molecular Biology, vol. 266, pp. 594-600, 1997.
[11] A. Reinhardt and T. Hubbard, “Using Neural Networks for Prediction of the Subcellular Location of Proteins,” Nucleic Acids Research, vol. 26, pp. 2230-2236, 1998.
[12] S.J. Hua and Z.R. Sun, “Support Vector Machine Approach for Protein Subcellular Localization Prediction,” Bioinformatics, vol. 17, pp. 721-728, 2001.
[13] Z. Yuan, “Prediction of Protein Subcellular Locations Using Markov Chain Models,” FEBS Letters, vol. 451, no. 1, pp. 23-26, 1999.
[14] K.J. Park and M. Kanehisa, “Prediction of Protein Subcellular Locations by Support Vector Machines Using Compositions of Amino Acids and Amino Acid Pairs,” Bioinformatics, vol. 19, no. 13, pp. 1656-1663, 2003.
[15] Y. Huang and Y.D. Li, “Prediction of Protein Subcellular Locations Using Fuzzy K-NN Method,” Bioinformatics, vol. 20, no. 1, pp. 21-28, 2004.
[16] K.C. Chou, “Prediction of Protein Cellular Attributes Using Pseudo-Amino Acid Composition,” Proteins: Structure, Function, and Genetics, vol. 43, pp. 246-255, 2001.
[17] Y.D. Cai and K.C. Chou, “Predicting Subcellular Localization of Proteins in a Hybridization Space,” Bioinformatics, vol. 20, pp.1151-1156, 2004.
[18] R. Nair and B. Rost, “Sequence Conserved for Subcellular Localization,” Protein Science, vol. 11, pp. 2836-2847, 2002.
[19] Z. Lu, D. Szafron, R. Greiner, P. Lu, D.S. Wishart, B. Poulin, J. Anvik, C. Macdonell, and R. Eisner, “Predicting Subcellular Localization of Proteins Using Machine-Learned Classifiers,” Bioinformatics, vol. 20, no. 4, pp. 547-556, 2004.
[20] J.K. Kim, G.P.S. Raghava, S.Y. Bang, and S. Choi, “Prediction of Subcellular Localization of Proteins Using Pairwise Sequence Alignment and Support Vector Machine,” Pattern Recognition Letters, vol. 27, no. 9, pp. 996-1001, 2006.
[21] J.L. Gardy, C. Spencer, K. Wang, M. Ester, G.E. Tusnady, I. Simon, S.J. Hua, K. deFays, C. Lambert, K. Nakai, and F.S.L. Brinkman, “PSORT-B: Improving Protein Subcellular Localization Prediction for Gram-Negative Bacteria,” Nucleic Acids Research, vol. 31, no. 13, pp. 3613-3617, 2003.
[22] M. Bhasin and G.P.S. Raghava, “ESLpred: SVM-Based Method for Subcellular Localization of Eukaryotic Proteins Using Dipeptide Composition and PSI-BLAST,” Nucleic Acids Research, vol. 32, Webserver Issue, pp. 414-419, 2004.
[23] A. Garg, M. Bhasin, and G.P.S. Raghava, “SVM-Based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions, Their Order and Similarity Search,” J. Biological Chemistry, vol. 280, pp. 14427-14432, 2005.
[24] S. Busuttil, J. Abela, and G.J. Pace, “Support Vector Machines with Profile-Based Kernels for Remote Protein Homology Detection,” Genome Informatics, vol. 15, no. 2, pp. 191-200, 2004.
[25] H. Rangwala and G. Karypis, “Profile-Based Direct Kernels for Remote Homology Detection and Fold Recognition,” Bioinformatics, vol. 21, no. 23, pp. 4239-4247, 2005.
[26] R. Kuang, E. Ie, K. Wang, K. Wang, M. Siddiqi, Y. Freund, and C. Leslie, “Profile-Based String Kernels for Remote Homology Detection and Motif Extraction,” J. Bioinformatics and Computational Biology, vol. 3, pp. 527-550, 2005.
[27] S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D.J. Lipman, “Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs,” Nucleic Acids Research, vol. 25, pp. 3389-3402, 1997.
[28] S. Henikoff and J.G. Henikoff, “Amino Acid Substitution Matrices from Protein Blocks,” Proc. Nat'l Academy of Sciences, pp. 10915-10919, 1992.
[29] T.F. Smith and M.S. Waterman, “Comparison of Biosequences,” Advances in Applied Math., vol. 2, pp. 482-489, 1981.
[30] O. Gotoh, “An Improved Algorithm for Matching Biological Sequences,” J. Molecular Biology, vol. 162, pp. 705-708, 1982.
[31] E.G. Shpaer, M. Robinson, D. Yee, J.D. Candlin, R. Mines, and T. Hunkapiller, “Sensitivity and Selectivity in Protein Similarity Searches: A Comparison of Smith-Waterman in Hardware to BLAST and FASTA,” Genomics, vol. 38, pp. 179-191, 1996.
[32] L. Liao and W.S. Noble, “Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships,” J. Computational Biology, vol. 10, no. 6, pp. 857-868, 2003.
[33] L. Rychlewski, B. Zhang, and A. Godzik, “Fold and Function Predictions for Mycoplasma Genitalium Proteins,” Folding and Design, vol. 3, no. 4, pp. 229-238, 1998.
[34] B. Boeckmann, A. Bairoch, R. Apweiler, M.C. Blatter, A. Estreicher, E. Gasteiger, M.J. Martin, K. Michoud, C. O'Donovan, I. Phan, S. Pilbout, and M. Schneider, “The SWISS-PROT Protein Knowledgebase and Its Supplement TrEMBL in 2003,” Nucleic Acids Research, vol. 31, pp. 365-370, 2003.
[35] B.W. Matthews, “Comparison of Predicted and Observed Secondary Structure of T4 Phage Lysozyme,” Biochimica et Biophysica Acta, vol. 405, pp. 442-451, 1975.
[36] M. Bhasin, A. Garg, and G.P.S. Raghava, “PSLpred: Prediction of Subcellular Localization of Bacterial Proteins,” Bioinformatics, vol. 21, no. 10, pp. 2522-2524, 2005.
[37] C.S. Yu, C.J. Lin, and J.K. Hwang, “Predicting Subcellular Localization of Proteins for Gram-Negative Bacteria by Support Vector Machines Based on N-Peptide Compositions,” Protein Science, vol. 13, pp. 1402-1406, 2004.
[38] S.Y. Kung and M.W. Mak, “Feature Selection for Pairwise Scoring Kernels with Applications to Protein Subcellular Localization,” Proc. IEEE Int'l Conf. Acoustic, Speech, and Signal Processing (ICASSP '07), pp. 569-572, 2007.
[39] P. Donnes and A. Hoglund, “Predicting Protein Subcellular Localization: Past, Present, and Future,” Genomics, Proteomics, and Bioinformatics, vol. 2, no. 4, pp. 209-215, 2004.
19 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool