The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - March/April (2012 vol.9)
pp: 467-475
L. Nanni , Dept. of Inf. Eng., Univ. of Padua, Padova, Italy
A. Lumini , DEIS, Univ. of Bologna, Cesena, Italy
D. Gupta , Struct. & Comput. Biol. Group, Int. Centre for Genetic Eng. & Biotechnol. (ICGEB), New Delhi, India
A. Garg , Struct. & Comput. Biol. Group, Int. Centre for Genetic Eng. & Biotechnol. (ICGEB), New Delhi, India
ABSTRACT
The availability of a reliable prediction method for prediction of bacterial virulent proteins has several important applications in research efforts targeted aimed at finding novel drug targets, vaccine candidates, and understanding virulence mechanisms in pathogens. In this work, we have studied several feature extraction approaches for representing proteins and propose a novel bacterial virulent protein prediction method, based on an ensemble of classifiers where the features are extracted directly from the amino acid sequence and from the evolutionary information of a given protein. We have evaluated and compared several ensembles obtained by combining six feature extraction methods and several classification approaches based on two general purpose classifiers (i.e., Support Vector Machine and a variant of input decimated ensemble) and their random subspace version. An extensive evaluation was performed according to a blind testing protocol, where the parameters of the system are optimized using the training set and the system is validated in three different independent data sets, allowing selection of the most performing system and demonstrating the validity of the proposed method. Based on the results obtained using the blind test protocol, it is interesting to note that even if in each independent data set the most performing stand-alone method is not always the same, the fusion of different methods enhances prediction efficiency in all the tested independent data sets.
INDEX TERMS
Proteins, Microorganisms, Feature extraction, Amino acids, Bioinformatics, Computational biology, Encoding,support vector machines., Virulent proteins, machine learning, ensemble of classifiers
CITATION
L. Nanni, A. Lumini, D. Gupta, A. Garg, "Identifying Bacterial Virulent Proteins by Fusing a Set of Classifiers Based on Variants of Chou's Pseudo Amino Acid Composition and on Evolutionary Information", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 2, pp. 467-475, March/April 2012, doi:10.1109/TCBB.2011.117
REFERENCES
[1] K.A. Brogden, J.A. Roth, T.B. Stanton, C.A. Bolin, F.C. Minion, and M.J. Wannemuehler, Virulence Mechanisms of Bacterial Pathogens, third ed. ASM Press, 2000.
[2] R.A. Weiss, “Virulence and Pathogenesis,” Trends in Microbiology, vol. 10, pp. 314-317, 2002.
[3] I.M. Hastings, S. Paget-McNicol, and A. Saul, “Can Mutation and Selection Explain Virulence in Human P. Falciparum Infections?,” Malaria J., vol. 2, p. 3, 2004.
[4] D.M. Morens, G.K. Folkers, and A.S. Fauci, “The Challenge of Emerging and Re-Emerging Infectious Diseases,” Nature, vol. 430, pp. 242-249, 2004.
[5] R.D. Fleischmann, M.D. Adams, O. White, R.A. Clayton, E.F. Kirkness, A.R. Kerlavage, C.J. Bult, J.F. Tomb, B.A. Dougherty, J.M. Merrick, K. McKenney, G.G. Sutton, W. FitzHugh, C.A. Fields, J.D. Gocayne, J.D. Scott, R. Shirley, L.I. Liu, A. Glodek, J.M. Kelley, J.F. Weidman, C.A. Phillips, T. Spriggs, E. Hedblom, M.D. Cotton, T.R. Utterback, M.C. Hanna, D.T. Nguyen, D.M. Saudek, R.C. Brandon, L.D. Fine, J.L. Fritchman, J.L. Fuhrmann, N.S.M. Geoghagen, C.L. Gnehm, L.A. McDonald, K.V. Small, C.M. Fraser, H.O. Smith, and J.C. Venter, “Whole-Genome Random Sequencing and Assembly of Haemophilus Influenzae Rd,” Science, vol. 269, pp. 496-512, 1995.
[6] K. Liolios, N. Tavernarakis, P. Hugenholtz, and N.C. Kyrpides, “The Genomes On Line Database (GOLD) v.2: A Monitor of Genome Projects Worldwide,” Nucleic Acids Research, vol. 34, pp. D332-D334, 2006.
[7] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman, “Basic Local Alignment Search Tool,” J. Molecular Biology, vol. 215, pp. 403-410, 1990.
[8] S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D.J. Lipman, “Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs,” Nucleic Acids Research, vol. 25, pp. 3389-3402, 1997.
[9] G. Sachdeva, K. Kumar, P. Jain, and S. Ramachandran, “SPAAN: A Software for Prediction of Adhesins and Adhesin-Like Proteins Using Neural Networks,” Bioinformatics, vol. 21, pp. 483-91, 2005.
[10] A. Bairoch and R. Apweiler, “The SWISS-PROT Protein Sequence Database and Its Supplement TrEMBL in 2000,” Nucleic Acids Research, vol. 28, pp. 45-48, 2000.
[11] L. Chen, J. Yang, J. Yu, Z. Yao, L. Sun, Y. Shen, and Q. Jin, “VFDB: A Reference Database for Bacterial Virulence Factors,” Nucleic Acids Research, vol. 33, pp. D325-D328, 2005.
[12] A. Garg and D. Gupta, “VirulentPred: A SVM Based Prediction Method for Virulent Proteins in Bacterial Pathogens,” BMC Bioinformatics, vol. 9, article 62, 2008, doi:10.1186/1471-2105-9-62.
[13] E. Tantoso and K.-B. Li, “AAIndexLoc: Predicting Subcellular Localization of Proteins Based on a New Representation of Sequences Using Amino Acid Indices,” Amino Acids, vol. 35, pp. 343-353, 2007.
[14] K.C. Chou and H.B. Shen, “Review: Recent Progresses in Protein Subcellular Location Prediction,” Analytical Biochemistry, vol. 370, pp. 1-16, 2007.
[15] K.C. Chou and H.B. Shen, “MemType-2L: A Web Server for Predicting Membrane Proteins and Their Types by Incorporating Evolution Information through Pse-PSSM,” Biochemical and Biophysical Research Comm., vol. 360, pp. 339-345, 2007.
[16] K.C. Chou and H.B. Shen, “Signal-CF: A Subsite-Coupled and Window-Fusing Approach for Predicting Signal Peptides,” Biochemical and Biophysical Research Comm., vol. 357, pp. 633-640, 2007.
[17] K.C. Chou and H.B. Shen, “Euk-mPLoc: A Fusion Classifier for Large-Scale Eukaryotic Protein Subcellular Location Prediction by Incorporating Multiple Sites,” J. Proteome Research, vol. 6, pp. 1728-1734, 2007.
[18] S.K. Riis and A. Krogh, “Improving Prediction of Protein Secondary Structure Using Neural Networks and Multiple Sequence Alignments,” J. Computational Biology, vol. 3, pp. 163-183, 1996.
[19] H.B. Shen and K.C. Chou, “Ensemble Classifier for Protein Fold Pattern Recognition,” Bioinformatics, vol. 22, pp. 1717-1722, 2006.
[20] T. Fawcett, “ROC Graphs: Notes and Practical Considerations for Researchers,” technical report, Palo Alto, USA: HP Laboratories, 2004.
[21] P. Pudil, J. Novovicova, and J. Kittler, “Flotating Search Methods in Feature Selection,” Pattern Recognition Letters, vol. 15, pp. 1119-1125, 1994.
[22] L. Nanni and A. Lumini, “An Ensemble of K-Local Hyperplane for Predicting Protein-Protein Interactions,” Bioinformatics, vol. 22, no. 10, pp. 1207-1210, 2006.
[23] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge Univ. Press, 2000.
[24] T.K. Ho, “The Random Subspace Method for Constructing Decision Forests,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 20, no. 8, pp. 832-844, Aug. 1998.
[25] S. Kawashima and M. Kanehisa, “AAindex: Amino Acid Index Database,” Nucleic Acids Research, vol. 28, p. 374, 2000.
[26] J. Kittler, “On Combining Classifiers,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 20, no. 3, pp. 226-239, Mar. 1998.
[27] L. Nanni and A. Lumini, “A Genetic Approach for Building Different Alphabets for Peptide and Protein Classification,” BMC Bioinformatics, vol. 9, p. 45, Jan. 2008.
[28] Goldberg and E. David, Genetic Algorithms in Search, Optimization and Machine Learning. Kluwer Academic, 1989.
[29] Goldberg and E. David, The Design of Innovation: Lessons from and for Competent Genetic Algorithms. Addison-Wesley, 2002.
[30] M. Lilic, M. Vujanac, and C.E. Stebbins, “A Common Structural Motif in the Binding of Virulence Factors to Bacterial Secretion Chaperones,” Molecular Cell, vol. 21, pp. 653-664, 2006.
[31] L. Nanni and A. Lumini, “An Ensemble of Support Vector Machines for Predicting Virulent Proteins,” Expert Systems with Applications, vol. 36, no. 4, pp. 7458-7462, May 2009.
[32] K. Tumer and N.C. Oza, “Input Decimated Ensembles,” Pattern Analysis Application, vol. 6, pp. 65-77, 2003.
[33] L. Nanni and A. Lumini, “Ensemble Generation and Feature Selection for the Identification of Students with Learning Disabilities,” Expert System with Applications, vol. 36, pp. 3896-3900, 2009.
[34] L. Nanni and A. Lumini, “Using Ensemble of Classifiers in Bioinformatics,” Machine Learning Research Progress, Nova publishers, 2008.
[35] X. He, D. Cai, S. Yan, and H.-J. Zhang, “Neighborhood Preserving Embedding,” Proc. 10th IEEE Int'l Conf. Computer Vision (ICCV '05), 2005.
[36] J. Guo, Y. Lin, and Z. Sun, “A Novel Method for Protein Subcellular Localization: Combining Residue-Couple Model and SVM,” Proc. Third Asia-Pacific Bioinformatics Conf., pp. 117-129, 2005.
[37] D. Sarda, G.H. Chua, K. Li, and A. Krishnan, “pSLIP: SVM Based Protein Subcellular Localization Prediction Using Multiple Physicochemical Properties,” BMC Bioinformatics, vol. 6, article 152, 2005.
[38] L. Nanni and A. Lumini, “Genetic Programming for Creating Chou's Pseudo Amino Acid Based Features for Submitochondria Localization,” Amino Acids, vol. 34, no. 4, pp. 653-660, 2008.
[39] L. Nanni and A. Lumini, “Input Decimated Ensemble Based on Neighborhood Preserving Embedding for Spectrogram Classification,” Expert Systems with Applications, vol. 36, pp. 11257-11261, 2009, doi:10.1016/j.eswa.2009.02.072.
[40] A. Martin et al., “The DET Curve in Assessment of Decision Task Performance,” Proc. EuroSpeech, pp. 1895-1898, 1997.
[41] K.C. Chou, “Some Remarks on Protein Attribute Prediction and Pseudo Amino Acid Composition (50th Anniversary Year Review),” J. Theoretical Biology, vol. 273, pp. 236-247, 2011.
[42] Y.D. Cai, G.P. Zhou, and K.C. Chou, “Support Vector Machines for Predicting Membrane Protein Types by Using Functional Domain Composition,” Biophysical J., vol. 84, pp. 3257-3263, 2003.
[43] Y.D. Cai, R. Pong-Wong, K. Feng, J.C.H. Jen, and K.C. Chou, “Application of SVM to Predict Membrane Protein Types,” J. Theoretical Biology, vol. 226, pp. 373-376, 2004.
[44] K.C. Chou and Y.D. Cai, “Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location,” J. Biological Chemistry, vol. 277, pp. 45765-45769, 2002.
[45] Y.D. Cai, X.J. Liu, X.B. Xu, and K.C. Chou, “Prediction of Protein Structural Classes by Support Vector Machines,” Computers and Chemistry, vol. 26, pp. 293-296, 2002.
[46] Y.S. Ding, T.L. Zhang, and K.C. Chou, “Prediction of Protein Structure Classes with Pseudo Amino Acid Composition and Fuzzy Support Vector Machine Network,” Protein and Peptide Letters, vol. 14, pp. 811-815, 2007.
[47] Y.D. Cai, X.J. Liu, X.B. Xu, and K.C. Chou, “Support Vector Machines for Predicting the Specificity of GalNAc-Transferase,” Peptides, vol. 23, pp. 205-208, 2002.
[48] Y.D. Cai, X.J. Liu, X.B. Xu, and K.C. Chou, “Support Vector Machines for Predicting HIV Protease Cleavage Sites in Protein,” J. Computational Chemistry, vol. 23, pp. 267-274, 2002.
[49] Y.D. Cai, S. Lin, and K.C. Chou, “Support Vector Machines for Prediction of Protein Signal Sequences and Their Cleavage Sites,” Peptides, vol. 24, pp. 159-161, 2003.
[50] Y.D. Cai, K.Y. Feng, Y.X. Li, and K.C. Chou, “Support Vector Machine for Predicting Alpha-Turn Types,” Peptides, vol. 24, pp. 629-630, 2003.
[51] Y.D. Cai, G.P. Zhou, C.H. Jen, S.L. Lin, and K.C. Chou, “Identify Catalytic Triads of Serine Hydrolases by Support Vector Machines,” J. Theoretical Biology, vol. 228, pp. 551-557, 2004.
[52] K.C. Chou and C.T. Zhang, “Review: Prediction of Protein Structural Classes,” Critical Rev. Biochemistry and Molecular Biology, vol. 30, pp. 275-349, 1995.
[53] K.C. Chou and H.B. Shen, “Cell-PLoc: A Package of Web Servers for Predicting Subcellular Localization of Proteins in Various Organisms,” Nature Protocols, vol. 3, pp. 153-162, 2008.
[54] K.C. Chou and H.B. Shen, “Cell-PLoc 2.0: An Improved Package of Web-Servers for Predicting Subcellular Localization of Proteins in Various Organisms,” Natural Science, vol. 2, pp. 1090-1103, 2010, http://www.scirp.org/journalNS/.
[55] G. Ji, X. Wu, Y. Shen, J. Huang, and Q. Li, and Q., “A Classification-Based Prediction Model of Messenger RNA Polyadenylation Sites,” J. Theoretical Biology, vol. 265, pp. 287-296, 2010.
[56] K.K. Kandaswamy, K.C. Chou, T. Martinetz, S. Moller, P.N. Suganthan, S. Sridharan, and G. Pugalenthi, “AFP-Pred: A Random Forest Approach for Predicting Antifreeze Proteins from Sequence-Derived Properties,” J. Theoretical Biology, vol. 270, pp. 56-62, 2011.
[57] H. Lin and H. Ding, “Predicting Ion Channels and Their Types by the Dipeptide Mode of Pseudo Amino Acid Composition,” J. Theoretical Biology, vol. 269, pp. 64-69, 2011.
[58] T. Liu and C. Jia, “A High-Accuracy Protein Structural Class Prediction Algorithm Using Predicted Secondary Structural Information,” J. Theoretical Biology, vol. 267, pp. 272-275, 2010.
[59] M. Masso and I.I. Vaisman, “Knowledge-Based Computational Mutagenesis for Predicting the Disease Potential of Human Non-Synonymous Single Nucleotide Polymorphisms,” J. Theoretical Biology, vol. 266, pp. 560-568, 2010.
[60] C. Chen, L. Chen, X. Zou, and P. Cai, “Prediction of Protein Secondary Structure Content by Using the Concept of Chou's Pseudo Amino Acid Composition and Support Vector Machine,” Protein and Peptide Letters, vol. 16, pp. 27-31, 2009.
[61] H. Ding, L. Luo, and H. Lin, “Prediction of Cell Wall Lytic Enzymes Using Chou's Amphiphilic Pseudo Amino Acid Composition,” Protein and Peptide Letters, vol. 16, pp. 351-355, 2009.
[62] F.M. Li and Q.Z. Li, “Predicting Protein Subcellular Location Using Chou's Pseudo Amino Acid Composition and Improved Hybrid Approach,” Protein and Peptide Letters, vol. 15, pp. 612-616, 2008.
[63] H. Lin, H. Ding, F.-B. Guo, A.Y. Zhang, and J. Huang, “Predicting Subcellular Localization of Mycobacterial Proteins by Using Chou's Pseudo Amino Acid Composition,” Protein and Peptide Letters, vol. 15, pp. 739-744, 2008.
[64] H. Mohabatkar, “Prediction of Cyclin Proteins Using Chou's Pseudo Amino Acid Composition,” Protein and Peptide Letters, vol. 17, pp. 1207-1214, 2010.
[65] X. Xiao, P. Wang, and K.C. Chou, “GPCR-2L: Predicting G Protein-Coupled Receptors and Their Types by Hybridizing Two Different Modes of Pseudo Amino Acid Compositions,” Molecular Biosystems, vol. 7, pp. 911-919, 2011.
[66] M. Esmaeili, H. Mohabatkar, and S. Mohsenzadeh, “Using the Concept of Chou's Pseudo Amino Acid Composition for Risk Type Prediction of Human Papillomaviruses,” J. Theoretical Biology, vol. 263, pp. 203-209, 2010.
[67] Y.H. Zeng, Y.Z. Guo, R.Q. Xiao, L. Yang, L.Z. Yu, and M.L. Li, “Using the Augmented Chou's Pseudo Amino Acid Composition for Predicting Protein Submitochondria Locations Based on Auto Covariance Approach,” J. Theoretical Biology, vol. 259, pp. 366-372, 2009.
[68] K.C. Chou and H.B. Shen, “Review: Recent Progresses in Protein Subcellular Location Prediction,” Analytical Biochemistry, vol. 370, pp. 1-16, 2007.
[69] K.C. Chou and H.B. Shen, “A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites,” Euk-mPLoc 2.0 PLoS ONE, vol. 5, p. e9931, 2010.
[70] K.C. Chou and H.B. Shen, “Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization,” PLoS ONE, vol. 5, p. e11335, 2010.
[71] K.C. Chou, “Pseudo Amino Acid Composition and Its Applications in Bioinformatics, Proteomics and System Biology,” Current Proteomics, vol. 6, pp. 262-274, 2009.
[72] K.C. Chou and H.B. Shen, “Review: Recent Advances in Developing Web-Servers for Predicting Protein Attributes,” Natural Science, vol. 2, pp. 63-92, 2009, http://www.scirp.org/journal NS/.
47 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool