The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - Nov.-Dec. (2012 vol.9)
pp: 1766-1775
Xin Ma , State Key Lab. of Bioelectronics, Southeast Univ., Nanjing, China
Jing Guo , State Key Lab. of Bioelectronics, Southeast Univ., Nanjing, China
Hong-De Liu , State Key Lab. of Bioelectronics, Southeast Univ., Nanjing, China
Jian-Ming Xie , State Key Lab. of Bioelectronics, Southeast Univ., Nanjing, China
Xiao Sun , State Key Lab. of Bioelectronics, Southeast Univ., Nanjing, China
ABSTRACT
The recognition of DNA-binding residues in proteins is critical to our understanding of the mechanisms of DNA-protein interactions, gene expression, and for guiding drug design. Therefore, a prediction method DNABR (DNA Binding Residues) is proposed for predicting DNA-binding residues in protein sequences using the random forest (RF) classifier with sequence-based features. Two types of novel sequence features are proposed in this study, which reflect the information about the conservation of physicochemical properties of the amino acids, and the correlation of amino acids between different sequence positions in terms of physicochemical properties. The first type of feature uses the evolutionary information combined with the conservation of physicochemical properties of the amino acids while the second reflects the dependency effect of amino acids with regards to polarity-charge and hydrophobic properties in the protein sequences. Those two features and an orthogonal binary vector which reflect the characteristics of 20 types of amino acids are used to build the DNABR, a model to predict DNA-binding residues in proteins. The DNABR model achieves a value of 0.6586 for Matthew's correlation coefficient (MCC) and 93.04 percent overall accuracy (ACC) with a 68.47 percent sensitivity (SE) and 98.16 percent specificity (SP), respectively. The comparisons with each feature demonstrate that these two novel features contribute most to the improvement in predictive ability. Furthermore, performance comparisons with other approaches clearly show that DNABR has an excellent prediction performance for detecting binding residues in putative DNA-binding protein. The DNABR web-server system is freely available at http://www.cbi.seu.edu.cn/DNABR/.
INDEX TERMS
Amino acids, Proteins, Correlation, Predictive models, Radio frequency, DNA, Evolutionary computation,evolutionary information, DNA-binding residues, random forest, physicochemical property
CITATION
Xin Ma, Jing Guo, Hong-De Liu, Jian-Ming Xie, Xiao Sun, "Sequence-Based Prediction of DNA-Binding Residues in Proteins with Conservation and Correlation Information", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 6, pp. 1766-1775, Nov.-Dec. 2012, doi:10.1109/TCBB.2012.106
REFERENCES
[1] J. Wang and Morigen, “BayesPI - A New Model to Study Protein-DNA Interactions: A Case Study of Condition-Specific Protein Binding Parameters for Yeast Transcription Factors,” BMC Bioinformatics, vol. 10, article 345, 2009.
[2] L. Zamdborg and P. Ma, “Discovery of Protein-DNA Interactions by Penalized Multivariate Regression,” Nucleic Acids Research, vol. 37, no. 16, pp. 5246-5254, Sept. 2009.
[3] J.B. Kinney, G. Tkacik, and C.G. CallanJr., “Precise Physical Models of Protein-DNA Interaction from High-Throughput Data,” Proc Nat'l Academy of Sciences USA, vol. 104, no. 2, pp. 501-506, Jan. 2007.
[4] U. Singh, E. Bongcam-Rudloff, and B. Westermark, “A DNA Sequence Directed Mutual Transcription Regulation of HSF1 and NFIX Involves Novel Heat Sensitive Protein Interactions,” PLoS One, vol. 4, no. 4,e5050, pp. 1-12, 2009.
[5] D. Ucar et al., “Predicting Functionality of Protein-DNA Interactions by Integrating Diverse Evidence,” Bioinformatics, vol. 25, no. 12, pp. i137-i144, June 2009.
[6] J.M. Vaquerizas et al., “A Census of Human Transcription Factors: Function, Expression and Evolution,” Nature Rev. Genetics, vol. 10, no. 4, pp. 252-263, Apr. 2009.
[7] A. Hoglund and O. Kohlbacher, “From Sequence to Structure and Back Again: Approaches for Predicting Protein-DNA Binding,” Proteome Science, vol. 2, no. 1, pp. 1-9, June 2004.
[8] Y. Fang et al., “Predicting DNA-Binding Proteins: Approached from Chou's Pseudo Amino Acid Composition and Other Specific Sequence Features,” Amino Acids, vol. 34, no. 1, pp. 103-109, Jan. 2008.
[9] L. Nanni and A. Lumini, “Combing Ontologies and Dipeptide Composition for Predicting DNA-Binding Proteins,” Amino Acids, vol. 34, no. 4, pp. 635-641, May 2008.
[10] V.H. Nagaraj, R.A. O'Flanagan, and A.M. Sengupta, “Better Estimation of Protein-DNA Interaction Parameters Improve Prediction of Functional Sites,” BMC Biotechnology, vol. 8, article 94, 2008.
[11] P. Aloy et al., “Automated Structure-Based Prediction of Functional Sites in Proteins: Applications to Assessing the Validity of Inheriting Protein Function from Homology in Genome Annotation and to Protein Docking,” J. Molecular Biology, vol. 311, no. 2, pp. 395-408, Aug. 2001.
[12] N. Bhardwaj et al., “Structure Based Prediction of Binding Residues on DNA-Binding Proteins,” Proc. IEEE 27th Int'l Conf. Eng. in Medicine and Biology Soc., vol. 3, pp. 2611-4, 2005.
[13] S. Jones et al., “Using Electrostatic Potentials to Predict DNA-Binding Sites on DNA-Binding Proteins,” Nucleic Acids Research, vol. 31, no. 24, pp. 7189-98, Dec. 2003.
[14] I.B. Kuznetsov et al., “Using Evolutionary and Structural Information to Predict DNA-Binding Sites on DNA-Binding Proteins,” Proteins, vol. 64, no. 1, pp. 19-27, July 2006.
[15] G. Nimrod et al., “Identification of DNA-Binding Proteins Using Structural, Electrostatic and Evolutionary Features,” J. Molecular Biology, vol. 387, no. 4, pp. 1040-1053, Apr. 2009.
[16] G. Nimrod et al., “iDBPs: A Web Server for the Identification of DNA Binding Proteins,” Bioinformatics, vol. 26, no. 5, pp. 692-693, Mar. 2010.
[17] E.W. Stawiski, L.M. Gregoret, and Y. Mandel-Gutfreund, “Annotating Nucleic Acid-Binding Function Based on Protein Structure,” J. Molecular Biology, vol. 326, no. 4, pp. 1065-1079, Feb. 2003.
[18] S. Ahmad and A. Sarai, “PSSM-Based Prediction of DNA Binding Sites in Proteins,” BMC Bioinformatics, vol. 6, article 33, 2005.
[19] L. Wang and S.J. Brown, “Prediction of DNA-Binding Residues from Sequence Features,” J. Bioinformatics Computational Biology, vol. 4, no. 6, pp. 1141-1158, Dec. 2006.
[20] L. Wang and S.J. Brown, “BindN: A Web-Based Tool for Efficient Prediction of DNA and RNA Binding Sites in Amino Acid Sequences,” Nucleic Acids Research, vol. 34, no. web server issue, pp. W243-W248, July 2006.
[21] C. Yan et al., “Predicting DNA-Binding Sites of Proteins from Amino Acid Sequence,” BMC Bioinformatics, vol. 7, article 262, 2006.
[22] N. Bhardwaj and H. Lu, “Residue-Level Prediction of DNA-Binding Sites and Its Application on DNA-Binding Protein Predictions,” FEBS Letters, vol. 581, no. 5, pp. 1058-1066, Mar. 2007.
[23] S. Hwang, Z. Gou, and I.B. Kuznetsov, “DP-Bind: A Web Server for Sequence-Based Prediction of DNA-Binding Residues in DNA-Binding Proteins,” Bioinformatics, vol. 23, no. 5, pp. 634-636, Mar. 2007.
[24] Y. Ofran, V. Mysore, and B. Rost, “Prediction of DNA-Binding Residues from Sequence,” Bioinformatics, vol. 23, no. 13, pp. i347-i353, July 2007.
[25] H. Tjong and H.X. Zhou, “DISPLAR: An Accurate Method for Predicting DNA-Binding Sites on Protein Surfaces,” Nucleic Acids Research, vol. 35, no. 5, pp. 1465-1477, 2007.
[26] Y.F. Huang et al., “DNA-Binding Residues and Binding Mode Prediction with Binding-Mechanism Concerned Models,” BMC Genomics, vol. 10, Suppl. 3, article S23, pp. 1-10, 2009.
[27] L. Wang, M.Q. Yang, and J.Y. Yang, “Prediction of DNA-Binding Residues from Protein Sequence Information Using Random Forests,” BMC Genomics, vol. 10, Suppl. 1, article S1, pp. 1-9, 2009.
[28] J.-S. Wu, X. Ma, H.-D. Liu, X.-N. Yang, J.-M. Xie, and X. Sun, “A SVM-Based Approach for Predicting DNA-Binding Residues in Proteins from Amino Acid Sequences,” Proc. Int'l Joint Conf. Bioinformatics, Systems Biology and Intelligent Computing, pp. 225-229, 2009.
[29] L. Wang et al., “BindN+ for Accurate Prediction of DNA and RNA-Binding Residues from Protein Sequence Features,” BMC Systems Biology, vol. 4, Suppl 1, article S3, pp. 1-9, 2010.
[30] S. Ahmad, M.M. Gromiha, and A. Sarai, “Analysis and Prediction of DNA-Binding Proteins and Their Binding Residues Based on Composition, Sequence and Structural Information,” Bioinformatics, vol. 20, no. 4, pp. 477-486, Mar. 2004.
[31] J. Si et al., “MetaDBSite: A Meta Approach to Improve Protein DNA-Binding Sites Prediction,” BMC Systems Biology, vol. 5, Suppl 1, article S7, pp. 1-7, 2011.
[32] H.M. Berman et al., “The Protein Data Bank,” Nucleic Acids Research, vol. 28, no. 1, pp. 235-242, Jan. 2000.
[33] S.F. Altschul et al., “Basic Local Alignment Search Tool,” J. Molecular Biology, vol. 215, no. 3, pp. 403-410, Oct. 1990.
[34] S.F. Altschul et al., “Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389-3402, Sept. 1997.
[35] Q. Zhang, S. Yoon, and W.J. Welsh, “Improved Method for Predicting Beta-Turn Using Support Vector Machine,” Bioinformatics, vol. 21, no. 10, pp. 2370-2374, May 2005.
[36] J. Wang, “Biochemistry,” Higher Education (in Chinese), 2002.
[37] V. Veljkovic et al., “Application of the EIIP/ISM Bioinformatics Concept in Development of New Drugs,” Current Medical Chemistry, vol. 14, no. 4, pp. 441-453, 2007.
[38] D. Bonchev, “The Overall Wiener Index—A New Tool for Characterization of Molecular Topology,” J. Chemical Information and Computer Sciences, vol. 41, no. 3, pp. 582-592, May/June 2001.
[39] V.N. Vapnik, Statisical Learning Theory. Wiley, 1998.
[40] L.J. Hu Xiu zhen, “Statistical Analysis of Application of Hydrophilicity Hydrophobicity and Molecular Size of Amino Acid (in Chinese),” J. Inner Mongolia Polytechnic Univ., vol. 19, no. 3, pp. 187-191, 2000.
[41] J. Shen et al., “Predicting Protein-Protein Interactions Based Only on Sequences Information,” Proc. Nat'l Academy of Sciences USA, vol. 104, no. 11, pp. 4337-4341, Mar. 2007.
[42] L. Breiman, “Random Forests,” Machine Learning, vol. 45, pp. 5-32, 2001.
[43] A.W. Liaw and M. Weiner, “Classification and Regression by Random Forest,” R. News, vol. 2, pp. 18-22, 2002.
[44] G. Cohen et al., “Learning from Imbalanced Data in Surveillance of Nosocomial Infection,” Artificial Intelligence Medicine, vol. 37, no. 1, pp. 7-18, May 2006.
[45] J.P. Egan, Signal Detection Theory and ROC-Analysis. Academic Press, 1975.
[46] A.P. Bradley, “The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms,” Pattern Recognition, vol. 30, pp. 1145-1159, 1997.
[47] W.L. DeLano, The PyMOL Molecular Graphics System. DeLa no Scientific, 2002.
[48] V. Veljkovic, A Theoretical Approach to Preselection of Carcinogens and Chemical Carcinogenesis. Gordon & Breach, 1980.
[49] N. Veljkovic et al., “Discovery of New Therapeutic Targets by the Informational Spectrum Method,” Current Protein Peptide Sciences, vol. 9, no. 5, pp. 493-506, Oct. 2008.
10 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool