The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - July-Aug. (2013 vol.10)
pp: 994-1008
Dong-Jun Yu , Sch. of Comput. Sci. & Eng., Nanjing Univ. of Sci. & Technol., Nanjing, China
Jun Hu , Sch. of Comput. Sci. & Eng., Nanjing Univ. of Sci. & Technol., Nanjing, China
Jing Yang , Key Lab. of Syst. Control & Inf. Process., Shanghai Jiao Tong Univ., Shanghai, China
Hong-Bin Shen , Key Lab. of Syst. Control & Inf. Process., Shanghai Jiao Tong Univ., Shanghai, China
Jinhui Tang , Sch. of Comput. Sci. & Eng., Nanjing Univ. of Sci. & Technol., Nanjing, China
Jing-Yu Yang , Sch. of Comput. Sci. & Eng., Nanjing Univ. of Sci. & Technol., Nanjing, China
ABSTRACT
Accurately identifying the protein-ligand binding sites or pockets is of significant importance for both protein function analysis and drug design. Although much progress has been made, challenges remain, especially when the 3D structures of target proteins are not available or no homology templates can be found in the library, where the template-based methods are hard to be applied. In this paper, we report a new ligand-specific template-free predictor called TargetS for targeting protein-ligand binding sites from primary sequences. TargetS first predicts the binding residues along the sequence with ligand-specific strategy and then further identifies the binding sites from the predicted binding residues through a recursive spatial clustering algorithm. Protein evolutionary information, predicted protein secondary structure, and ligand-specific binding propensities of residues are combined to construct discriminative features; an improved AdaBoost classifier ensemble scheme based on random undersampling is proposed to deal with the serious imbalance problem between positive (binding) and negative (nonbinding) samples. Experimental results demonstrate that TargetS achieves high performances and outperforms many existing predictors. TargetS web server and data sets are freely available at: http://www.csbio.sjtu.edu.cn/bioinf/TargetS/ for academic use.
INDEX TERMS
sequences, bioinformatics, bonds (chemical), learning (artificial intelligence), molecular biophysics, molecular configurations, proteins, sampling methods, binding sample-nonbinding sample imbalance problem, template-free predictor design, targeting protein-ligand binding site, accurate protein-ligand binding site identification, accurate pocket identification, protein function analysis, drug design, target protein 3D structure, homology template, template-based method application, ligand-specific template-free predictor, TargetS predictor, primary sequence, sequence binding residue prediction, ligand-specific strategy, recursive spatial clustering algorithm, protein evolutionary information, protein secondary structure prediction, residue ligand-specific binding propensity, discriminative feature construction, improved AdaBoost classifier ensemble scheme, random undersampling, positive sample-negative sample imbalance problem, Training, Feature extraction, Protein sequence, Metals, Bioinformatics, spatial clustering, Protein-ligand binding sites, ligand-specific prediction model, template-free, classifier ensemble
CITATION
Dong-Jun Yu, Jun Hu, Jing Yang, Hong-Bin Shen, Jinhui Tang, Jing-Yu Yang, "Designing Template-Free Predictor for Targeting Protein-Ligand Binding Sites with Classifier Ensemble and Spatial Clustering", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 4, pp. 994-1008, July-Aug. 2013, doi:10.1109/TCBB.2013.104
REFERENCES
[1] B. Alberts, Molecular Biology of the Cell, fifth ed. Garland Science, 2008.
[2] M. Gao and J. Skolnick, "The Distribution of Ligand-Binding Pockets Around Protein-Protein Interfaces Suggests a General Mechanism for Pocket Formation," Proc. Nat'l Academy of Science USA, vol. 109, no. 10, pp. 3784-3789, Mar. 2012.
[3] H. Kokubo, T. Tanaka, and Y. Okamoto, "Ab Initio Prediction of Protein-Ligand Binding Structures by Replica-Exchange Umbrella Sampling Simulations," J. Computational Chemistry, vol. 32, no. 13, pp. 2810-2821, Oct. 2011.
[4] P. Schmidtke and X. Barril, "Understanding and Predicting Druggability. A High-Throughput Method for Detection of Drug Binding Sites," J. Medicinal Chemistry, vol. 53, no. 15, pp. 5858-5867, Aug. 2010.
[5] H.M. Berman et al., "The Protein Data Bank," Nucleic Acids Research, vol. 28, no. 1, pp. 235-242, 2000.
[6] J.S. Chauhan, N.K. Mishra, and G.P. Raghava, "Identification of ATP Binding Residues of a Protein from Its Primary Sequence," BMC Bioinformatics, vol. 10, article 434, 2009.
[7] K. Chen, M.J. Mizianty, and L. Kurgan, "ATPsite: Sequence-Based Prediction of ATP-Binding Residues," Proteome Science, vol. 9, no. Suppl 1, article S4, 2011.
[8] S. Leis, S. Schneider, and M. Zacharias, "In Silico Prediction of Binding Sites on Proteins," Current Medicinal Chemistry, vol. 17, no. 15, pp. 1550-1562, 2010.
[9] A. Roy and Y. Zhang, "Recognizing Protein-Ligand Binding Sites by Global Structural Alignment and Local Geometry Refinement," Structure, vol. 20, no. 6, pp. 987-997, June 2012.
[10] M. Brylinski and J. Skolnick, "FINDSITE: A Threading-Based Approach to Ligand Homology Modeling," PLoS Computational Biology, vol. 5, no. 6,article e1000405, June 2009.
[11] A.T. Laurie and R.M. Jackson, "Methods for the Prediction of Protein-Ligand Binding Sites for Structure-Based Drug Design and Virtual Ligand Screening," Current Protein and Peptide Science, vol. 7, no. 5, pp. 395-406, Oct. 2006.
[12] R. Liu and J. Hu, "HemeBIND: A Novel Method for Heme Binding Residue Prediction by Combining Structural and Sequence Information," BMC Bioinformatics, vol. 12, article 207, 2011.
[13] M. Hendlich, F. Rippmann, and G. Barnickel, "LIGSITE: Automatic and Efficient Detection of Potential Small Molecule-Binding Sites in Proteins," J. Molecular Graphics and Modelling, vol. 15, no. 6, pp. 359-363, Dec. 1997.
[14] J. Dundas et al., "CASTp: Computed Atlas of Surface Topography of Proteins with Structural and Topographical Mapping of Functionally Annotated Residues," Nucleic Acids Research, vol. 34, no. Web Server issue, pp. W116-W118, July 2006.
[15] R.A. Laskowski, "SURFNET: A Program for Visualizing Molecular Surfaces, Cavities, and Intermolecular Interactions," J. Molecular Graphics, vol. 13, no. 5, pp. 323-330, Oct. 1995.
[16] D.G. Levitt and L.J. Banaszak, "POCKET: A Computer Graphics Method for Identifying and Displaying Protein Cavities and Their Surrounding Amino Acids," J. Molecular Graphics, vol. 10, no. 4, pp. 229-34, Dec. 1992.
[17] V. Le Guilloux, P. Schmidtke, and P. Tuffery, "Fpocket: an Open Source Platform for Ligand Pocket Detection," BMC Bioinformatics, vol. 10, article 168, 2009.
[18] A.T. Laurie and R.M. Jackson, "Q-SiteFinder: An Energy-Based Method for the Prediction of Protein-Ligand Binding Sites," Bioinformatics, vol. 21, no. 9, pp. 1908-1916, May 2005.
[19] M. Hernandez, D. Ghersi, and R. Sanchez, "SITEHOUND-Web: A Server for Ligand Binding Site Identification in Protein Structures," Nucleic Acids Research, vol. 37, no. Web Server issue, pp. W413-W416, July 2009.
[20] B. Hoffmann et al., "A New Protein Binding Pocket Similarity Measure Based on Comparison of Clouds of Atoms in 3D: Application to Ligand Prediction," BMC Bioinformatics, vol. 11, article 99, 2010.
[21] A. Kahraman et al., "Shape Variation in Protein Binding Pockets and Their Ligands," J. Molecular Biology, vol. 368, no. 1, pp. 283-301, Apr. 2007.
[22] A. Armon, D. Graur, and N. Ben-Tal, "ConSurf: An Algorithmic Tool for the Identification of Functional Regions in Proteins by Surface Mapping of Phylogenetic Information," J. Molecular Biology, vol. 307, no. 1, pp. 447-463, Mar. 2001.
[23] T. Pupko et al., "Rate4Site: An Algorithmic Tool for the Identification of Functional Regions in Proteins by Surface Mapping of Evolutionary Determinants within Their Homologues," Bioinformatics, vol. 18, no. Suppl 1, pp. S71-S77, 2002.
[24] Y. Dou et al., "L1pred: A Sequence-Based Prediction Tool for Catalytic Residues in Enzymes with the L1-Logreg Classifier," PLoS One, vol. 7, no. 4,article e35666, 2012.
[25] B. Huang and M. Schroeder, "LIGSITEcsc: Predicting Ligand Binding Sites Using the Connolly Surface and Degree of Conservation," BMC Structural Biology, vol. 6, article 19, 2006.
[26] J.A. Capra et al., "Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure," PLoS Computational Biology, vol. 5, no. 12,article e1000585, Dec. 2009.
[27] F. Glaser et al., "A method for Localizing Ligand Binding Pockets in Protein Structures," Proteins, vol. 62, no. 2, pp. 479-488, Feb. 2006.
[28] P. Aloy et al., "Structure-Based Assembly of Protein Complexes in Yeast," Science, vol. 303, no. 5666, pp. 2026-2029, Mar. 2004.
[29] L. Lu et al., "Multimeric Threading-Based Prediction of Protein-Protein Interactions on a Genomic Scale: Application to the Saccharomyces cerevisiae Proteome," Genome Research, vol. 13, no. 6A, pp. 1146-1154, June 2003.
[30] K. Chen, M.J. Mizianty, and L. Kurgan, "Prediction and Analysis of Nucleotide-Binding Residues Using Sequence and Sequence-Derived Structural Descriptors," Bioinformatics, vol. 28, no. 3, pp. 331-341, Feb. 2012.
[31] F. Ferre and P. Clote, "DiANNA 1.1: An Extension of the DiANNA Web Server for Ternary Cysteine Classification," Nucleic Acids Research, vol. 34, no. Web Server issue, pp. W182-W185, July 2006.
[32] A. Passerini et al., "Identifying Cysteines and Histidines in Transition-Metal-Binding Sites Using Support Vector Machines and Neural Networks," Proteins, vol. 65, no. 2, pp. 305-316, Nov. 2006.
[33] A. Passerini, M. Lippi, and P. Frasconi, "Predicting Metal-Binding Sites from Protein Sequence," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 9, no. 1, pp. 203-213, Jan./Feb. 2012.
[34] S. Henrich et al., "Computational Approaches to Identifying and Characterizing Protein Binding Sites for Ligand Design," J. Molecular Recognition, vol. 23, no. 2, pp. 209-219, Mar./Apr. 2010.
[35] M.M. Gromiha, "Development of RNA Stiffness Parameters and Analysis on Protein-RNA Binding Specificity: Comparison with DNA," Current Bioinformatics, vol. 7, no. 2, pp. 173-179, June 2012.
[36] M.M. Gromiha et al., "Sequence and Structural Features of Binding Site Residues in Protein-Protein Complexes: Comparison with Protein-Nucleic Acid Complexes," Proteome Science, vol. 9, no. Suppl 1, article S13, 2011.
[37] R. Liu and J. Hu, "Computational Prediction of Heme-Binding Residues by Exploiting Residue Interaction Network," PLoS One, vol. 6, no. 10,article e25560, 2011.
[38] J.S. Sodhi et al., "Predicting Metal-Binding Site Residues in Low-Resolution Structural Models," J. Molecular Biology, vol. 342, no. 1, pp. 307-320, Sept. 2004.
[39] M. Brylinski and J. Skolnick, "FINDSITE-Metal: Integrating Evolutionary Information and Machine Learning for Structure-Based Metal-Binding Site Prediction at the Proteome Level," Proteins, vol. 79, no. 3, pp. 735-751, Mar. 2011.
[40] M. Kumar, A.M. Gromiha, and G.P.S. Raghava, "Prediction of RNA Binding Sites in a Protein Using SVM and PSSM Profile," Proteins-Structure Function and Bioinformatics, vol. 71, no. 1, pp. 189-194, Apr. 2008.
[41] V.N. Vapnik, Statistical Learning Theory. Wiley-Interscience, 1998.
[42] R.E. Fan, P.H. Chen, and C.J. Lin, "Working Set Selection Using Second Order Information for Training SVM," J. Machine Learning Research, vol. 6, pp. 1889-1918, 2005.
[43] M.A. Marti-Renom et al., "Comparative Protein Structure Modeling of Genes and Genomes," Ann. Rev. Biophysics and Biomolecular Structure, vol. 29, pp. 291-325, 2000.
[44] P.W. Rose et al., "The RCSB Protein Data Bank: Redesigned Web Site and Web Services," Nucleic Acids Research, vol. 39, no. Database issue, pp. D392-D401, Jan. 2011.
[45] J. Yang, A. Roy, and Y. Zhang, "BioLiP: A Semi-Manually Curated Database for Biologically Relevant Ligand-Protein Interactions," Nucleic Acids Research, vol. 41, no. D1, pp. D1096-D1103, Jan. 2013.
[46] B.H. Dessailly et al., "LigASite-A Database of Biologically Relevant Binding Sites in Proteins with Known Apo-Structures," Nucleic Acids Research, vol. 36, no. Database issue, pp. D667-D673, Jan. 2008.
[47] G. Lopez, A. Valencia, and M. Tress, "FireDB—A Database of Functionally Important Residues from Proteins of Known Structure," Nucleic Acids Research, vol. 35, no. Database issue, pp. D219-D223, Jan. 2007.
[48] R. Wang et al., "The PDBbind Database: Collection of Binding Affinities for Protein-Ligand Complexes with Known Three-Dimensional Structures," J. Medicinal Chemistry, vol. 47, no. 12, pp. 2977-2980, June 2004.
[49] G. Wang and R.L. DunbrackJr., "PISCES: A Protein Sequence Culling Server," Bioinformatics, vol. 19, no. 12, pp. 1589-1591, Aug. 2003.
[50] J.C. Jeong, X. Lin, and X.W. Chen, "On Position-Specific Scoring Matrix for Protein Function Prediction," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 2, pp. 308-315, Mar./Apr. 2011.
[51] Y.N. Zhang et al., "Predicting Protein-ATP Binding Sites from Primary Sequence through Fusing Bi-Profile Sampling of Multi-View Features," BMC Bioinformatics, vol. 13, article 118, 2012.
[52] D.J. Yu, H.B. Shen, and J.Y. Yang, "SOMPNN: An Efficient Non-Parametric Model for Predicting Transmembrane Helices," Amino Acids, vol. 42, no. 6, pp. 2195-2205, June 2012.
[53] M.H. Zangooei and S. Jalili, "Protein Secondary Structure Prediction Using DWKF Based on SVR-NSGAII," Neurocomputing, vol. 94, pp. 87-101, May 2012.
[54] A. Pierleoni, P.L. Martelli, and R. Casadio, "MemLoci: Predicting Subcellular Localization of Membrane Proteins in Eukaryotes," Bioinformatics, vol. 27, no. 9, pp. 1224-1230, May 2011.
[55] H.B. Shen and K.C. Chou, "A Top-Down Approach to Enhance the Power of Predicting Human Protein Subcellular Localization: Hum-mPLoc 2.0," Analytical Biochemistry, vol. 394, no. 2, pp. 269-274, Nov. 2009.
[56] M.W. Mak, J. Guo, and S.Y. Kung, "PairProSVM: Protein Subcellular Localization Based on Local Pairwise Profile Alignment and SVM," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 5, no. 3, pp. 416-422, July-Sept. 2008.
[57] A.A. Schaffer, "Improving the Accuracy of PSI-BLAST Protein Database Searches with Composition-Based Statistics and Other Refinements," Nucleic Acids Research, vol. 29, pp. 2994-3005, 2001.
[58] K. Chen, M.J. Mizianty, and L. Kurgan, "ATPsite: Sequence-Based Prediction of ATP-Binding Residues," Proteome Science, vol. 9, no. Suppl 1, p. S4, 2011.
[59] D.T. Jones, "Protein Secondary Structure Prediction Based on Position-Specific Scoring Matrices," J. Molecular Biology, vol. 292, no. 2, pp. 195-202, Sept. 1999.
[60] M.M. Gromiha and K. Fukui, "Scoring Function Based Approach for Locating Binding Sites and Understanding Recognition Mechanism of Protein-DNA Complexes," J. Chemical Information and Modeling, vol. 51, no. 3, pp. 721-729, Mar. 2011.
[61] C.H. Lu et al., "Prediction of Metal Ion-Binding Sites in Proteins Using the Fragment Transformation Method," PLoS One, vol. 7, no. 6,article e39252, 2012.
[62] H. He and E.A. Garcia, "Learning from Imbalanced Data," IEEE Trans. Knowledge and Data Eng., vol. 21, no. 9, pp. 1263-1284, Sept. 2009.
[63] Z.Y. Lin et al., "Several SVM Ensemble Methods Integrated with Under-Sampling for Imbalanced Data Learning," Proc. Fifth Int'l Conf. Advanced Data Mining and Applications (ADMA '09), pp. 536-554, 2009.
[64] P.L. Martelli, P. Fariselli, and R. Casadio, "An ENSEMBLE Machine Learning Approach for the Prediction of All-Alpha Membrane Proteins," Bioinformatics, vol. 19, no. Suppl 1, pp. i205-i211, 2003.
[65] L. Nanni, "A Novel Ensemble of Classifiers for Protein Fold Recognition," Neurocomputing, vol. 69, nos. 16-18, pp. 2434-2437, Oct. 2006.
[66] L. Nanni, "Ensemble of Classifiers for Protein Fold Recognition," Neurocomputing, vol. 69, nos. 7-9, pp. 850-853, Mar. 2006.
[67] J. Wu et al., "An Ensemble Classifier of Support Vector Machines Used to Predict Protein Structural Classes by Fusing Auto Covariance and Pseudo-Amino Acid Composition," Protein J., vol. 29, no. 1, pp. 62-67, Jan. 2010.
[68] G. Rogova, "Combining the Results of Several Neural Network Classifiers," Neural Networks, vol. 7, pp. 777-781, 1994.
[69] Y. Freund and R.E. Schapire, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting," J. Computer and System Sciences, vol. 55, no. 1, pp. 119-139, 1997.
[70] L.I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, 2004.
[71] O. Schueler-Furman and D. Baker, "Conserved Residue Clustering and Protein Structure Prediction," Proteins, vol. 52, no. 2, pp. 225-235, Aug. 2003.
[72] D.J. Yu et al., "TargetATPsite: A Template-Free Method for ATP-Binding Sites Prediction with Residue Evolution Image Sparse Representation and Classifier Ensemble," J. Computational Chemistry, vol. 34, pp. 974-985, Jan. 2013.
[73] D.B. Roche, S.J. Tetchner, and L.J. McGuffin, "FunFOLD: An Improved Automated Method for the Prediction of Ligand Binding Residues Using 3D Models of Proteins," BMC Bioinformatics, vol. 12, article 160, 2011.
[74] M. Babor et al., "Prediction of Transition Metal-Binding Sites from Apo Protein Structures," Proteins, vol. 70, no. 1, pp. 208-217, Jan. 2008.
[75] J. Si et al., "MetaDBSite: A Meta Approach to Improve Protein DNA-Binding Sites Prediction," BMC Systems Biology, vol. 5, no. Suppl 1, article S7, 2011.
[76] X. Ma et al., "Sequence-Based Prediction of DNA-Binding Residues in Proteins with Conservation and Correlation Information," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 9, no. 6, pp. 1766-1775, Nov./Dec. 2012.
[77] C. Berezin et al., "ConSeq: The Identification of Functionally and Structurally Important Residues in Protein Sequences," Bioinformatics, vol. 20, no. 8, pp. 1322-1324, May 2004.
[78] B.D. Huang, "MetaPocket: A Meta Approach to Improve Protein Ligand Binding Site Prediction," OMICS, vol. 13, no. 4, pp. 325-330, Aug. 2009.
[79] D.J. Yu et al., "Improving Protein-ATP Binding Residues Prediction by Boosting SVMs with Random Under-Sampling," Neurocomputing, vol. 104, pp. 180-190, 2013.
126 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool