This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Sequence-Based Prediction of DNA-Binding Residues in Proteins with Conservation and Correlation Information
Nov.-Dec. 2012 (vol. 9 no. 6)
pp. 1766-1775
Xin Ma, State Key Lab. of Bioelectronics, Southeast Univ., Nanjing, China
Jing Guo, State Key Lab. of Bioelectronics, Southeast Univ., Nanjing, China
Hong-De Liu, State Key Lab. of Bioelectronics, Southeast Univ., Nanjing, China
Jian-Ming Xie, State Key Lab. of Bioelectronics, Southeast Univ., Nanjing, China
Xiao Sun, State Key Lab. of Bioelectronics, Southeast Univ., Nanjing, China
The recognition of DNA-binding residues in proteins is critical to our understanding of the mechanisms of DNA-protein interactions, gene expression, and for guiding drug design. Therefore, a prediction method DNABR (DNA Binding Residues) is proposed for predicting DNA-binding residues in protein sequences using the random forest (RF) classifier with sequence-based features. Two types of novel sequence features are proposed in this study, which reflect the information about the conservation of physicochemical properties of the amino acids, and the correlation of amino acids between different sequence positions in terms of physicochemical properties. The first type of feature uses the evolutionary information combined with the conservation of physicochemical properties of the amino acids while the second reflects the dependency effect of amino acids with regards to polarity-charge and hydrophobic properties in the protein sequences. Those two features and an orthogonal binary vector which reflect the characteristics of 20 types of amino acids are used to build the DNABR, a model to predict DNA-binding residues in proteins. The DNABR model achieves a value of 0.6586 for Matthew's correlation coefficient (MCC) and 93.04 percent overall accuracy (ACC) with a 68.47 percent sensitivity (SE) and 98.16 percent specificity (SP), respectively. The comparisons with each feature demonstrate that these two novel features contribute most to the improvement in predictive ability. Furthermore, performance comparisons with other approaches clearly show that DNABR has an excellent prediction performance for detecting binding residues in putative DNA-binding protein. The DNABR web-server system is freely available at http://www.cbi.seu.edu.cn/DNABR/.
Index Terms:
sensitivity,biochemistry,biology computing,DNA,feature extraction,genetics,genomics,hydrophobicity,Internet,molecular biophysics,molecular configurations,pattern classification,proteins,random sequences,DNABR web-server system,sequence-based prediction,DNA-binding residue recognition,correlation information,conservation information,DNA-protein interactions,gene expression,drug design guidance,protein sequences,random forest classifier,sequence-based features,physicochemical properties,amino acids,evolutionary information,polarity charge,hydrophobic properties,orthogonal binary vector,Matthew correlation coefficient,sensitivity,Amino acids,Proteins,Correlation,Predictive models,Radio frequency,DNA,Evolutionary computation,evolutionary information,DNA-binding residues,random forest,physicochemical property
Citation:
Xin Ma, Jing Guo, Hong-De Liu, Jian-Ming Xie, Xiao Sun, "Sequence-Based Prediction of DNA-Binding Residues in Proteins with Conservation and Correlation Information," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 6, pp. 1766-1775, Nov.-Dec. 2012, doi:10.1109/TCBB.2012.106
Usage of this product signifies your acceptance of the Terms of Use.