Sequence-Based Prediction of microRNA-Binding Residues in Proteins Using Cost-Sensitive Laplacian Support Vector Machines
Issue No. 03 - May-June (2013 vol. 10)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2013.75
Jian-Sheng Wu , Nat. Key Lab. for Novel Software Technol., Nanjing Univ., Nanjing, China
Zhi-Hua Zhou , Nat. Key Lab. for Novel Software Technol., Nanjing Univ., Nanjing, China
The recognition of microRNA (miRNA)-binding residues in proteins is helpful to understand how miRNAs silence their target genes. It is difficult to use existing computational method to predict miRNA-binding residues in proteins due to the lack of training examples. To address this issue, unlabeled data may be exploited to help construct a computational model. Semisupervised learning deals with methods for exploiting unlabeled data in addition to labeled data automatically to improve learning performance, where no human intervention is assumed. In addition, miRNA-binding proteins almost always contain a much smaller number of binding than nonbinding residues, and cost-sensitive learning has been deemed as a good solution to the class imbalance problem. In this work, a novel model is proposed for recognizing miRNA-binding residues in proteins from sequences using a cost-sensitive extension of Laplacian support vector machines (CS-LapSVM) with a hybrid feature. The hybrid feature consists of evolutionary information of the amino acid sequence (position-specific scoring matrices), the conservation information about three biochemical properties (HKM) and mutual interaction propensities in protein-miRNA complex structures. The CS-LapSVM receives good performance with an F1 score of 26.23 + 2.55% and an AUC value of 0.805 + 0.020 superior to existing approaches for the recognition of RNA-binding residues. A web server called SARS is built and freely available for academic usage.
Proteins, Amino acids, Support vector machines, Predictive models, Laplace equations, Standards, Training,evolutionary information, support vector machines, biochemistry, molecular biophysics, proteins, RNA, mutual interaction propensities, sequence based prediction, microRNA binding residues, proteins, cost sensitive Laplacian support vector machines, unlabeled data, semisupervised learning, CS-LapSVM, hybrid feature, evolutionary information, amino acid sequence, position specific scoring matrices, biochemical properties, Proteins, Amino acids, Support vector machines, Predictive models, Laplace equations, Standards, Training, mutual interaction propensities, Laplacian support vector machine, cost-sensitive learning, miRNA-binding residues
Jian-Sheng Wu, Zhi-Hua Zhou, "Sequence-Based Prediction of microRNA-Binding Residues in Proteins Using Cost-Sensitive Laplacian Support Vector Machines", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. , pp. 752-759, May-June 2013, doi:10.1109/TCBB.2013.75