Sequence-Based Prediction of DNA-Binding Residues in Proteins with Conservation and Correlation Information
Issue No. 06 - Nov.-Dec. (2012 vol. 9)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2012.106
Xin Ma , State Key Lab. of Bioelectronics, Southeast Univ., Nanjing, China
Jing Guo , State Key Lab. of Bioelectronics, Southeast Univ., Nanjing, China
Hong-De Liu , State Key Lab. of Bioelectronics, Southeast Univ., Nanjing, China
Jian-Ming Xie , State Key Lab. of Bioelectronics, Southeast Univ., Nanjing, China
Xiao Sun , State Key Lab. of Bioelectronics, Southeast Univ., Nanjing, China
The recognition of DNA-binding residues in proteins is critical to our understanding of the mechanisms of DNA-protein interactions, gene expression, and for guiding drug design. Therefore, a prediction method DNABR (DNA Binding Residues) is proposed for predicting DNA-binding residues in protein sequences using the random forest (RF) classifier with sequence-based features. Two types of novel sequence features are proposed in this study, which reflect the information about the conservation of physicochemical properties of the amino acids, and the correlation of amino acids between different sequence positions in terms of physicochemical properties. The first type of feature uses the evolutionary information combined with the conservation of physicochemical properties of the amino acids while the second reflects the dependency effect of amino acids with regards to polarity-charge and hydrophobic properties in the protein sequences. Those two features and an orthogonal binary vector which reflect the characteristics of 20 types of amino acids are used to build the DNABR, a model to predict DNA-binding residues in proteins. The DNABR model achieves a value of 0.6586 for Matthew's correlation coefficient (MCC) and 93.04 percent overall accuracy (ACC) with a 68.47 percent sensitivity (SE) and 98.16 percent specificity (SP), respectively. The comparisons with each feature demonstrate that these two novel features contribute most to the improvement in predictive ability. Furthermore, performance comparisons with other approaches clearly show that DNABR has an excellent prediction performance for detecting binding residues in putative DNA-binding protein. The DNABR web-server system is freely available at http://www.cbi.seu.edu.cn/DNABR/.
Amino acids, Proteins, Correlation, Predictive models, Radio frequency, DNA, Evolutionary computation
Xin Ma, Jing Guo, Hong-De Liu, Jian-Ming Xie and Xiao Sun, "Sequence-Based Prediction of DNA-Binding Residues in Proteins with Conservation and Correlation Information," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 6, pp. 1766-1775, 2013.