The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - March-April (2013 vol.10)
pp: 436-446
Xiao Wang , Dept. of Control Sci. & Eng., Tongji Univ., Shanghai, China
Guo-Zheng Li , Dept. of Control Sci. & Eng., Tongji Univ., Shanghai, China
ABSTRACT
Prediction of protein subcellular localization is an important but challenging problem, particularly when proteins may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular localization methods are only used to deal with the single-location proteins. In the past few years, only a few methods have been proposed to tackle proteins with multiple locations. However, they only adopt a simple strategy, that is, transforming the multilocation proteins to multiple proteins with single location, which does not take correlations among different subcellular locations into account. In this paper, a novel method named random label selection (RALS) (multilabel learning via RALS), which extends the simple binary relevance (BR) method, is proposed to learn from multilocation proteins in an effective and efficient way. RALS does not explicitly find the correlations among labels, but rather implicitly attempts to learn the label correlations from data by augmenting original feature space with randomly selected labels as its additional input features. Through the fivefold cross-validation test on a benchmark data set, we demonstrate our proposed method with consideration of label correlations obviously outperforms the baseline BR method without consideration of label correlations, indicating correlations among different subcellular locations really exist and contribute to improvement of prediction performance. Experimental results on two benchmark data sets also show that our proposed methods achieve significantly higher performance than some other state-of-the-art methods in predicting subcellular multilocations of proteins. The prediction web server is available at http://levis.tongji.edu.cn:8080/bioinfo/MLPred-Euk/ for the public usage.
INDEX TERMS
Multilabel learning, Proteins, Random label selection, Benchmark testing,multilabel learning, random label selection, Protein subcellular localization, multilocation proteins
CITATION
Xiao Wang, Guo-Zheng Li, "Multilabel Learning via Random Label Selection for Protein Subcellular Multilocations Prediction", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 2, pp. 436-446, March-April 2013, doi:10.1109/TCBB.2013.21
REFERENCES
[1] H. Nakashima and K. Nishikawa, "Discrimination of Intracellular and Extracellular Proteins Using Amino Acid Composition and Residue-Pair Frequencies," J. Molecular Biology, vol. 238, no. 1, pp. 54-61, 1994.
[2] J. Cedano, P. Aloy, J.A. Prez-Pons, and E. Querol, "Relation between Amino Acid Composition and Cellular Location of Proteins," J. Molecular Biology, vol. 266, no. 3, pp. 594-600, 1997.
[3] A. Reinhardt and T. Hubbard, "Using Neural Networks for Prediction of the Subcellular Location of Proteins," Nucleic Acids Research, vol. 26, no. 9, pp. 2230-2236, 1998.
[4] K.-C. Chou and D.W. Elrod, "Protein Subcellular Location Prediction," Protein Eng., vol. 12, no. 2, pp. 107-118, 1999.
[5] Y. Huang and Y. Li, "Prediction of Protein Subcellular Locations Using Fuzzy k-NN method," Bioinformatics, vol. 20, no. 1, pp. 21-28, 2004.
[6] K.-J. Park and M. Kanehisa, "Prediction of Protein Subcellular Locations by Support Vector Machines Using Compositions of Amino Acids and Amino Acid Pairs," Bioinformatics, vol. 19, no. 13, pp. 1656-1663, 2003.
[7] K.-C. Chou, "Prediction of Protein Cellular Attributes Using Pseudo-Amino Acid Composition," Proteins: Structure, Function, and Bioinformatics, vol. 43, no. 3, pp. 246-255, 2001.
[8] F.-M. Li and Q.-Z. Li, "Predicting Protein Subcellular Location Using Chous Pseudo Amino Acid Composition and Improved Hybrid Approach," Protein and Peptide Letters, vol. 15, no. 6, pp. 612-616, 2008.
[9] J. Lin, Y. Wang, and X. Xu, "A Novel Ensemble and Composite Approach for Classifying Proteins Based on Chou's Pseudo Amino Acid Composition," African J. Biotechnology, vol. 10, no. 74, pp. 16963-16968, 2011.
[10] S. Briesemeister, T. Blum, S. Brady, Y. Lam, O. Kohlbacher, and H. Shatkay, "SherLoc2: A High-Accuracy Hybrid Method for Predicting Subcellular Localization of Proteins," J. Proteome Research, vol. 8, no. 11, pp. 5363-5366, 2009.
[11] W.-L. Huang, C.-W. Tung, S.-W. Ho, S.-F. Hwang, and S.-Y. Ho, "ProLoc-GO: Utilizing Informative Gene Ontology Terms for Sequence-Based Prediction of Protein Subcellular Localization," BMC Bioinformatics, vol. 9, article 80, 2008.
[12] S.-M. Chi, "Prediction of Protein Subcellular Localization by Weighted Gene Ontology Terms," Biochemical and Biophysical Research Comm., vol. 399, no. 3, pp. 402-405, 2010.
[13] S. Hua and Z. Sun, "Support Vector Machine Approach for Protein Subcellular Localization Prediction," Bioinformatics, vol. 17, no. 8, pp. 721-728, 2001.
[14] A. Garg, M. Bhasin, and G.P.S. Raghava, "Support Vector Machine-Based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions, Their Order, and Similarity Search," J. Biological Chemistry, vol. 280, no. 15, pp. 14427-14432, 2005.
[15] A. Khan, A. Majid, and M. Hayat, "CE-PLoc: An Ensemble Classifier for Predicting Protein Subcellular Locations by Fusing Different Modes of Pseudo Amino Acid Composition," Computational Biology and Chemistry, vol. 35, no. 4, pp. 218-229, 2011.
[16] J. Lin and Y. Wang, "Using a Novel AdaBoost Algorithm and Chou's Pseudo Amino Acid Composition for Predicting Protein Subcellular Localization," Protein and Peptide Letters, vol. 18, no. 11, pp. 1219-1225, 2011.
[17] Q. Xu, D. Hu, H. Xue, W. Yu, and Q. Yang, "Semi-Supervised Protein Subcellular Localization," BMC Bioinformatics, vol. 10, no. Suppl 1, article S47, 2009.
[18] Q. Xu, S.J. Pan, H.H. Xue, and Q. Yang, "Multitask Learning for Protein Subcellular Location Prediction," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 3, pp. 748-759, May/June 2011.
[19] S. Mei, W. Fei, and S. Zhou, "Gene Ontology Based Transfer Learning for Protein Subcellular Localization," BMC Bioinformatics, vol. 12, article 44, 2011.
[20] Y. Yoon and G.G. Lee, "Subcellular Localization Prediction through Boosting Association Rules," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 9, no. 2, pp. 609-618, Mar./Apr. 2012.
[21] L.J. Foster, C.L. de Hoog, Y. Zhang, Y. Zhang, X. Xie, V.K. Mootha, and M. Mann, "A Mammalian Organelle Map by Protein Correlation Profiling," Cell, vol. 125, no. 1, pp. 187-199, 2006.
[22] K.-C. Chou, "Some Remarks on Protein Attribute Prediction and Pseudo Amino Acid Composition," J. Theoretical Biology, vol. 273, no. 1, pp. 236-247, 2011.
[23] R.E. Schapire and Y. Singer, "BoosTexter: A Boosting-Based System for Text Categorization," Machine Learning, vol. 39, nos. 2/3, pp. 135-168, 2000.
[24] Y. Freund and R.E. Schapire, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting," J. Computer and System Sciences, vol. 55, no. 1, pp. 119-139, 1997.
[25] M.R. Boutell, J. Luo, X. Shen, and C.M. Brown, "Learning Multi-Label Scene Classification," Pattern Recognition, vol. 37, no. 9, pp. 1757-1771, 2004.
[26] H.-Y. Lo, J.-C. Wang, H.-M. Wang, and S.-D. Lin, "Cost-Sensitive Multi-Label Learning for Audio Tag Annotation and Retrieval," IEEE Trans. Multimedia, vol. 13, no. 3, pp. 518-529, June 2011.
[27] M. You, J. Liu, G.-Z. Li, and Y. Chen, "Embedded Feature Selection for Multi-Label Classification of Music Emotions," to be published in Int'l J. Computational Intelligence Systems.
[28] G. Tsoumakas, I. Katakis, and I. Vlahavas, "Mining Multi-Label Data," Data Mining and Knowledge Discovery Handbook, second ed., O. Maimon and L. Rokach, eds., pp. 667-685, Springer, 2010.
[29] A. Elisseeff and J. Weston, "Kernel Methods for Multi-Labelled Classification and Categorical Regression Problems," Advances in Neural Information Processing Systems, T.G. Dietterich, S. Becker, and Z. Ghahramani, eds., vol. 14, pp. 681-687, MIT Press, 2002.
[30] A. Clare and R.D. King, "Knowledge Discovery in Multi-Label Phenotype Data," Proc. Fifth European Conf. Principles of Data Mining and Knowledge Discovery, pp. 42-53, 2001.
[31] J. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[32] M.-L. Zhang and Z.-H. Zhou, "Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization," IEEE Trans. Knowledge and Data Eng., vol. 18, no. 10, pp. 1338-1351, Oct. 2006.
[33] H.-B. Shen and K.-C. Chou, "Hum-mPLoc: An Ensemble Classifier for Large-Scale Human Protein Subcellular Location Prediction by Incorporating Samples with Multiple Sites," Biochemical and Biophysical Research Comm., vol. 355, no. 4, pp. 1006-1011, 2007.
[34] K.-C. Chou and H.-B. Shen, "Euk-mPLoc: A Fusion Classifier for Large-Scale Eukaryotic Protein Subcellular Location Prediction by Incorporating Multiple Sites," J. Proteome Research, vol. 6, no. 5, pp. 1728-1734, 2007.
[35] H.-B. Shen and K.-C. Chou, "A Top-Down Approach to Enhance the Power of Predicting Human Protein Subcellular Localization: Hum-mPLoc 2.0," Analytical Biochemistry, vol. 394, no. 2, pp. 269-274, 2009.
[36] K.-C. Chou and H.-B. Shen, "Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization," PLoS ONE, vol. 5, no. 6, article e11335, 2010.
[37] K.-C. Chou and H.-B. Shen, "A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc 2.0," PLoS ONE, vol. 5, no. 4, article e9931, 2010.
[38] S. Mei, "Predicting Plant Protein Subcellular Multi-Localization by Chou's PseAAC Formulation Based Multi-Label Homolog Knowledge Transfer Learning," J. Theoretical Biology, vol. 310, pp. 80-87, 2012.
[39] L.-Q. Li, Y. Zhang, L.-Y. Zou, Y. Zhou, and X.-Q. Zheng, "Prediction of Protein Subcellular Multi-Localization Based on the General form of Chou's Pseudo Amino Acid Composition," Protein and Peptide Letters, vol. 19, no. 4, pp. 375-387, 2012.
[40] X. Wang and G.-Z. Li, "A Multi-Label Predictor for Identifying the Subcellular Locations of Singleplex and Multiplex Eukaryotic Proteins," PLoS ONE, vol. 7, no. 5, article e36317, 2012.
[41] J. He, H. Gu, and W. Liu, "Imbalanced Multi-Modal Multi-Label Learning for Subcellular Localization Prediction of Human Proteins with Both Single and Multiple Sites," PLoS ONE, vol. 7, no. 6, article e37155, 2012.
[42] X. Wang, G.-Z. Li, and W.-C. Lu, "Virus-ECC-mPLoc: a Multi-Label Predictor for Predicting the Subcellular Localization of Virus Proteins with Both Single and Multiple Sites Based on a General form of Chou's Pseudo Amino Acid Composition," Protein and Peptide Letters, vol. 20, no. 3, pp. 309-317, 2013.
[43] P. Horton, K.-J. Park, T. Obayashi, N. Fujita, H. Harada, C. Adams-Collier, and K. Nakai, "WoLF PSORT: Protein Localization Predictor," Nucleic Acids Research, vol. 35, pp. W585-W587, 2007.
[44] H.-N. Lin, C.-T. Chen, T.-Y. Sung, S.-Y. Ho, and W.-L. Hsu, "Protein Subcellular Localization Prediction of Eukaryotes Using a Knowledge-Based Approach," BMC Bioinformatics, vol. 10, no. Suppl 15, article S8, 2009.
[45] S. Briesemeister, J. Rahnenführer, and O. Kohlbacher, "Going from Where to Why-Interpretable Prediction of Protein Subcellular Localization," Bioinformatics, vol. 26, no. 9, pp. 1232-1238, 2010.
[46] K.-C. Chou and H.-B. Shen, "Cell-PLoc: A Package of Web Servers for Predicting Subcellular Localization of Proteins in Various Organisms," Nature Protocols, vol. 3, no. 2, pp. 153-162, 2008.
[47] K.-C. Chou and H.-B. Shen, "Cell-PLoc 2.0: An Improved package of Web-Servers for Predicting Subcellular Localization of Proteins in Various Organisms," Natural Science, vol. 2, no. 10, pp. 1090-1103, 2010.
[48] K.-C. Chou, Z.-C. Wu, and X. Xiao, "iLoc-Euk: A Multi-Label Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Eukaryotic Proteins," PLoS ONE, vol. 6, no. 3, article e18258, 2011.
[49] K.-C. Chou, Z.-C. Wu, and X. Xiao, "iLoc-Hum: Using the Accumulation-Label Scale to Predict Subcellular Locations of Human Proteins with Both Single and Multiple Sites," Molecular BioSystems, vol. 8, no. 2, pp. 629-614, 2012.
[50] Z.-C. Wu, X. Xiao, and K.-C. Chou, "iLoc-Plant: A Multi-Label Classifier for Predicting the Subcellular Localization of Plant Proteins with Both Single and Multiple Sites," Molecular Bio-Systems, vol. 7, no. 12, pp. 3287-3297, 2011.
[51] W.-Z. Lin, J.-A. Fang, X. Xiao, and K.-C. Chou, "iLoc-Animal: A Multi-Label Learning Classifier for Predicting Subcellular Localization of Animal Proteins," Molecular BioSystems, vol. 9, pp. 634-644, 2013.
[52] Z.-C. Wu, X. Xiao, and K.-C. Chou, "iLoc-Gpos: A Multi-Layer Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Gram-Positive Bacterial Proteins," Protein and Peptide Letters, vol. 19, no. 1, pp. 4-14, 2012.
[53] X. Xiao, Z.-C. Wu, and K.-C. Chou, "A Multi-Label Classifier for Predicting the Subcellular Localization of Gram-Negative Bacterial Proteins with Both Single and Multiple Sites," PLoS ONE, vol. 6, no. 6, article e20592, 2011.
[54] X. Xiao, Z.-C. Wu, and K.-C. Chou, "iLoc-Virus: A Multi-Label Learning Classifier for Identifying the Subcellular Localization of Virus Proteins with Both Single and Multiple Sites," J. Theoretical Biology, vol. 284, no. 1, pp. 42-51, 2011.
[55] K.-C. Chou, "Using Amphiphilic Pseudo Amino Acid Composition to Predict Enzyme Subfamily Classes," Bioinformatics, vol. 21, no. 1, pp. 10-19, 2005.
[56] L. Nanni, A. Lumini, D. Gupta, and A. Garg, "Identifying Bacterial Virulent Proteins by Fusing a Set of Classifiers Based on Variants of Chou's Pseudo Amino Acid Composition and on Evolutionary Information," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 9, no. 2, pp. 467-475, Mar./Apr. 2012.
[57] D. Zou, Z. He, J. He, and Y. Xia, "Supersecondary Structure Prediction Using Chou's Pseudo Amino Acid Composition," J. Computational Chemistry, vol. 32, no. 2, pp. 271-278, 2011.
[58] J.-D. Qiu, J.-H. Huang, S.-P. Shi, and R.-P. Liang, "Using the Concept of Chous Pseudo Amino Acid Composition to Predict Enzyme Family Classes: An Approach with Support Vector Machine Based on Discrete Wavelet Transform," Protein and Peptide Letters, vol. 17, no. 6, pp. 715-722, 2010.
[59] X.-B. Zhou, C. Chen, Z.-C. Li, and X.-Y. Zou, "Using Chou's Amphiphilic Pseudo-Amino Acid Composition and Support Vector Machine for Prediction of Enzyme Subfamily Classes," J. Theoretical Biology, vol. 248, no. 3, pp. 546-551, 2007.
[60] Y.-h. Zeng, Y.-z. Guo, R.-q. Xiao, L. Yang, L.-z. Yu, and M.-l. Li, "Using the Augmented Chou's Pseudo Amino Acid Composition for Predicting Protein Submitochondria Locations Based on Auto Covariance Approach," J. Theoretical Biology, vol. 259, no. 2, pp. 366-372, 2009.
[61] L. Nanni and A. Lumini, "Genetic Programming for Creating Chou's Pseudo Amino Acid Based Features for Submitochondria Localization," Amino Acids, vol. 34, no. 4, pp. 653-660, 2008.
[62] C. Tanford, "Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins," J. Am. Chemical Soc., vol. 84, no. 22, pp. 4240-4247, 1962.
[63] T.P. Hopp and K.R. Woods, "Prediction of Protein Antigenic Determinants from Amino Acid Sequences," Proc. Nat'l Academy of Sciences USA, vol. 78, no. 6, pp. 3824-3828, 1981.
[64] C.-T. Su, C.-Y. Chen, and Y.-Y. Ou, "Protein Disorder Prediction by Condensed PSSM Considering Propensity for Order or Disorder," BMC Bioinformatics, vol. 7, article 319, 2006.
[65] M. Kumar, M.M. Gromiha, and G.P.S. Raghava, "Prediction of RNA Binding Sites in a Protein Using SVM and PSSM Profile," Proteins: Structure, Function, and Bioinformatics, vol. 71, no. 1, pp. 189-194, 2008.
[66] M. Kumar, M.M. Gromiha, and G.P.S. Raghava, "SVM Based Prediction of RNA-Binding Proteins Using Binding Residues and Evolutionary Information," J. Molecular Recognition, vol. 24, no. 2, pp. 303-313, 2011.
[67] L. Zhu, J. Yang, and H.-B. Shen, "Multi Label Learning for Prediction of Human Protein Subcellular Localizations," The Protein J., vol. 28, nos. 9/10, pp. 384-390, 2009.
[68] A.A. Schffer, L. Aravind, T.L. Madden, S. Shavirin, J.L. Spouge, Y.I. Wolf, E.V. Koonin, and S.F. Altschul, "Improving the Accuracy of PSI-BLAST Protein Database Searches with Composition-Based Statistics and Other Refinements," Nucleic Acids Research, vol. 29, no. 14, pp. 2994-3005, 2001.
[69] T. Liu, X. Geng, X. Zheng, R. Li, and J. Wang, "Accurate Prediction of Protein Structural Class Using Auto Covariance Transformation of PSI-BLAST Profiles," Amino Acids, vol. 42, no. 6, pp. 2243-2249, 2011.
[70] S. Zhang, X. Xia, J. Shen, Y. Zhou, and Z. Sun, "DBMLoc: A Database of Proteins with Multiple Subcellular Localizations," BMC Bioinformatics, vol. 9, article 127, 2008.
[71] K.-C. Chou and H.-B. Shen, "Recent Progress in Protein Subcellular Location Prediction," Analytical Biochemistry, vol. 370, no. 1, pp. 1-16, 2007.
[72] H. Lin, "The Modified Mahalanobis Discriminant for Predicting Outer Membrane Proteins by Using Chou's Pseudo Amino Acid Composition," J. Theoretical Biology, vol. 252, no. 2, pp. 350-356, 2008.
[73] G.-Y. Zhang, H.-C. Li, J.-Q. Gao, and B.-S. Fang, "Predicting Lipase Types by Improved Chou's Pseudo-Amino Acid Composition," Protein and Peptide Letters, vol. 15, no. 10, pp. 1132-1137, 2008.
[74] X. Jian, R. Wei, T. Zhan, and Q. Gu, "Using the Concept of Chous Pseudo Amino Acid Composition to Predict Apoptosis Proteins Subcellular Location: An Approach by Approximate Entropy," Protein and peptide letters, vol. 15, no. 4, pp. 392-396, 2008.
[75] Y.-S. Ding and T.-L. Zhang, "Using Chou's Pseudo Amino Acid Composition to Predict Subcellular Localization of Apoptosis Proteins: An Approach with Immune Genetic Algorithm-Based Ensemble Classifier," Pattern Recognition Letters, vol. 29, no. 13, pp. 1887-1892, 2008.
[76] C. Chen, L. Chen, X. Zou, and P. Cai, "Prediction of Protein Secondary Structure Content by Using the Concept of Chous Pseudo Amino Acid Composition and Support Vector Machine," Protein and Peptide Letters, vol. 16, no. 1, pp. 27-31, 2009.
[77] K.-C. Chou and Y.-D. Cai, "Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location," J. Biological Chemistry, vol. 277, no. 48, pp. 45765-45769, 2002.
[78] M. Bhasin and G.P.S. Raghava, "ESLpred: SVM-Based Method for Subcellular Localization of Eukaryotic Proteins Using Dipeptide Composition and PSI-BLAST," Nucleic Acids Research, vol. 32, pp. W414-W419, 2004.
[79] C.-S. Yu, C.-J. Lin, and J.-K. Hwang, "Predicting Subcellular Localization of Proteins for Gram-Negative Bacteria by Support Vector Machines Based on N-Peptide Compositions," Protein Science, vol. 13, no. 5, pp. 1402-1406, 2004.
[80] J. Wang, W.-K. Sung, A. Krishnan, and K.-B. Li, "Protein Subcellular Localization Prediction for Gram-Negative Bacteria Using Amino Acid Subalphabets and a Combination of Multiple Support Vector Machines," BMC Bioinformatics, vol. 6, no. 1, article 174, 2005.
[81] C.-C. Chang and C.-J. Lin, "LIBSVM: A Library for Support Vector Machines," ACM Trans. Intelligent Systems and Technology, vol. 2, no. 3, pp. 27:1-27:27, 2011.
[82] K.-C. Chou and H.-B. Shen, "Review: Recent Advances in Developing Web-Servers for Predicting Protein Attributes," Natural Science, vol. 1, no. 2, pp. 63-92, 2009.
167 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool