The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - July-Aug. (2013 vol.10)
pp: 1017-1031
Yuan Zhu , Dept. of Math., Guangdong Univ. of Finance & Econ., Guangzhou, China
Weiqiang Zhou , Dept. of Electron. Eng., City Univ. of Hong Kong, Kowloon, China
Dao-Qing Dai , Dept. of Math., Sun Yat-Sen Univ., Guangzhou, China
Hong Yan , Dept. of Electron. Eng., City Univ. of Hong Kong, Kowloon, China
ABSTRACT
Interactions between biomolecules play an essential role in various biological processes. For predicting DNA-binding or protein-binding proteins, many machine-learning-based techniques have used various types of features to represent the interface of the complexes, but they only deal with the properties of a single atom in the interface and do not take into account the information of neighborhood atoms directly. This paper proposes a new feature representation method for biomolecular interfaces based on the theory of graph wavelet. The enhanced graph wavelet features (EGWF) provides an effective way to characterize interface feature through adding physicochemical features and exploiting a graph wavelet formulation. Particularly, graph wavelet condenses the information around the center atom, and thus enhances the discrimination of features of biomolecule binding proteins in the feature space. Experiment results show that EGWF performs effectively for predicting DNA-binding and protein-binding proteins in terms of Matthew's correlation coefficient (MCC) score and the area value under the receiver operating characteristic curve (AUC).
INDEX TERMS
proteins, biological techniques, biology computing, DNA, graph theory, molecular biophysics, receiver operating characteristic, DNA-binding proteins, protein-binding proteins, enhanced graph wavelet features, biomolecule interactions, machine learning based techniques, complex interface, feature representation method, biomolecular interfaces, graph wavelet theory, EGWF, interface feature, physicochemical features, graph wavelet formulation, protein feature discrimination, biomolecule binding proteins, area under ROC curve, Proteins, Educational institutions, Bioinformatics, Feature extraction, Computational biology, Correlation, Atomic measurements, alpha shape model, Protein-protein interaction, protein-DNA interaction, graph wavelet
CITATION
Yuan Zhu, Weiqiang Zhou, Dao-Qing Dai, Hong Yan, "Identification of DNA-Binding and Protein-Binding Proteins Using Enhanced Graph Wavelet Features", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 4, pp. 1017-1031, July-Aug. 2013, doi:10.1109/TCBB.2013.117
REFERENCES
[1] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. Bhat, H. Weissig, I. Shindyalov, and P.E. Bourne, "The Protein Data Bank," Nucleic Acids Research, vol. 28, no. 1, pp. 235-242, 2000.
[2] Y. Tsuchiya, K. Kinoshita, and H. Nakamura, "Structure-Based Prediction of DNA-Binding Sites on Proteins Using the Empirical Preference of Electrostatic Potential and the Shape of Molecular Surfaces," Proteins: Structure, Function, and Bioinformatics, vol. 55, no. 4, pp. 885-894, 2004.
[3] Y. Ofran, V. Mysore, and B. Rost, "Prediction of DNA-Binding Residues from Sequence," Bioinformatics, vol. 23, no. 13, pp. 347-353, 2007.
[4] J.W. Shen, J. Zhang, X.M. Luo, W.L. Zhu, K.Q. Yu, K.X. Chen, Y.X. Li, and H.L. Jiang, "Predicting Protein-Protein Interactions Based Only on Sequences Information," Proc. Nat'l Academy of Sciences USA, vol. 104, no. 11, pp. 4337-4341, 2007.
[5] Y. Murakami and K. Mizuguchi, "Applying the Naïve Bayes Classifier with Kernel Density Estimation to the Prediction of Protein-Protein Interaction Sites," Bioinformatics, vol. 26, no. 15, pp. 1841-1848, 2010.
[6] Y.D. Cai and S.L. Lin, "Support Vector Machines for Predicting rRNA-, RNA-, and DNA-Binding Proteins from Amino Acid Sequence," Biochimica et Biophysica Acta, vol. 1648, nos. 1/2, pp. 127-133, 2003.
[7] S. Pitre et al., "PIPE: A Protein-Protein Interaction Prediction Engine Based on the Re-Occurring Short Polypeptide Sequences Between Known Interacting Protein Pairs," BMC Bioinformatics, vol. 7, no. 1, article 365, 2006.
[8] J.F. Huang, R.Q. Deng, J.W. Wang, H.K. Wu, Y.Y. Xiong, and X.Z. Wang, "MetaPIS: A Sequence-Based Meta-Server for Protein Interaction Site Prediction," Protein and Peptide Letters, vol. 20, no. 2, pp. 218-230, 2013.
[9] R. Samudrala and J. Moult, "An All-Atom Distance-Dependent Conditional Probability Discriminatory Function for Protein Structure Prediction1," J. Molecular Biology, vol. 275, no. 5, pp. 895-916, 1998.
[10] J.R. Bock and D.A. Gough, "Predicting Protein-Protein Interactions from Primary Structure," Bioinformatics, vol. 17, no. 5, pp. 455-460, 2001.
[11] A. Poupon, "Voronoi and Voronoi-Related Tessellations in Studies of Protein Structure and Interaction," Current Opinion in Structural Biology, vol. 14, no. 2, pp. 233-241, 2004.
[12] H. Neuvirth, R. Raz, and G. Schreiber, "ProMate: A Structure Based Prediction Program to Identify the Location of Protein-Protein Binding Sites," J. Molecular Biology, vol. 338, no. 1, pp. 181-199, 2004.
[13] A.S. Aytuna, A. Gursoy, and O. Keskin, "Prediction of Protein-Protein Interactions by Combining Structure and Sequence Conservation in Protein Interfaces," Bioinformatics, vol. 21, no. 12, pp. 2850-2855, 2005.
[14] Y. Zhang and J. Skolnick, "TM-align: A Protein Structure Alignment Algorithm Based on the TM-score," Nucleic Acids Research, vol. 33, no. 7, pp. 2302-2309, 2005.
[15] B. Liu, X.L. Wang, L. Lin, B.Z. Tang, Q.W. Dong, and X. Wang, "Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine," BMC Bioinformatics, vol. 10, no. 1, article 381, 2009.
[16] S. Jones and J.M. Thornton, "Analysis of Protein-Protein Interaction Sites Using Surface Patches1," J. Molecular Biology, vol. 272, no. 1, pp. 121-132, 1997.
[17] S. Madabushi, A.K. Gross, A. Philippi, E.C. Meng, T.G. Wensel, and O. Lichtarge, "Evolutionary Trace of G Protein-Coupled Receptors Reveals Clusters of Residues That Determine Global and Class-Specific Functions," J. Biological Chemistry, vol. 279, no. 9, pp. 8126-8132, 2004.
[18] Y. Ofran and B. Rost, "Analysing Six Types of Protein-Protein Interfaces," J. Molecular Biology, vol. 325, no. 2, pp. 377-387, 2003.
[19] I. Reš, I. Mihalek, and O. Lichtarge, "An Evolution Based Classifier for Prediction of Protein Interfaces without Using Protein Structures," Bioinformatics, vol. 21, no. 10, pp. 2496-2501, 2005.
[20] Y. Ofran and B. Rost, "Predicted Protein-Protein Interaction Sites from Local Sequence Information," FEBS Letters, vol. 544, nos. 1-3, pp. 236-239, 2003.
[21] D.J. SenGupta, B. Zhang, B. Kraemer, P. Pochart, S. Fields, and M. Wickens, "A Three-Hybrid System to Detect RNA-Protein Interactions in Vivo," Proc. Nat'l Academy of Sciences USA, vol. 93, no. 16, pp. 8496-8501, 1996.
[22] G.D. Bader and C.W.V. Hogue, "Analyzing Yeast Protein-Protein Interaction Data Obtained from Different Sources," Nature Biotechnology, vol. 20, no. 10, pp. 991-997, 2002.
[23] S. Ahmad and A. Sarai, "Moment-Based Prediction of DNA-Binding Proteins," J. Molecular Biology, vol. 341, no. 1, pp. 65-71, 2004.
[24] M. Gao and J. Skolnick, "DBD-Hunter: A Knowledge-Based Method for the Prediction of DNA-Protein Interactions," Nucleic Acids Research, vol. 36, no. 12, pp. 3978-3992, 2008.
[25] L.P. Albou, B. Schwarz, O. Poch, J.M. Wurtz, and D. Moras, "Defining and Characterizing Protein Surface Using Alpha Shapes," Proteins: Structure, Function, and Bioinformatics, vol. 76, no. 1, pp. 1-12, 2009.
[26] S. Sankararaman, F. Sha, J.F. Kirsch, M.I. Jordan, and K. Sjölander, "Active Site Prediction Using Evolutionary and Structural Information," Bioinformatics, vol. 26, no. 5, pp. 617-624, 2010.
[27] S. Binny Priya, S. Saha, R. Anishetty, and S. Anishetty, "A Matrix Based Algorithm for Protein Protein Interaction Prediction Using Domain Associations," J. Theoretical Biology, vol. 326, pp. 36-42, 2013.
[28] N. Tuncbag, G. Kar, O. Keskin, A. Gursoy, and R. Nussinov, "A Survey of Available Tools and Web Servers for Analysis of Protein-Protein Interactions and Interfaces," Briefings in Bioinformatics, vol. 10, no. 3, pp. 217-232, 2009.
[29] H. Edelsbrunner and E. Mücke, "Three-Dimensional Alpha Shapes," Proc. 1992 Workshop Volume Visualization pp. 75-82, 1992.
[30] W.Q. Zhou and H. Yan, "A Discriminatory Function for Prediction of Protein-DNA Interactions Based on Alpha Shape Modeling," Bioinformatics, vol. 26, no. 20, pp. 2541-2548, 2010.
[31] T.A. Robertson and G. Varani, "An All-Atom, Distance-Dependent Scoring Function for the Prediction of Protein-DNA Interactions from Structure," Proteins: Structure, Function, and Bioinformatics, vol. 66, no. 2, pp. 359-374, 2007.
[32] J.R. Bradford and D.R. Westhead, "Improved Prediction of Protein-Protein Binding Sites Using a Support Vector Machines Approach," Bioinformatics, vol. 21, no. 8, pp. 1487-1494, 2005.
[33] A. González and L. Liao, "Predicting Domain-Domain Interaction Based on Domain Profiles with Feature Selection and Support Vector Machines," BMC Bioinformatics, vol. 11, no. 1, article 537, 2010.
[34] M. Keil, T.E. Exner, and J. Brickmann, "Pattern Recognition Strategies for Molecular Surfaces: III. Binding Site Prediction with a Neural Network," J. Computational Chemistry, vol. 25, no. 6, pp. 779-789, 2004.
[35] M. Kumar, M.M. Gromiha, and G.P.S. Raghava, "Identification of DNA-Binding Proteins Using Support Vector Machines and Evolutionary Profiles," BMC Bioinformatics, vol. 8, no. 1, article 463, 2007.
[36] A.C. Gavin, M. Bösche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J. Schultz, J.M. Rick, A.M. Michon, and C.M. Cruciat, "Functional Organization of the Yeast Proteome by Systematic Analysis of Protein Complexes," Nature, vol. 415, no. 6868, pp. 141-147, 2002.
[37] Y. Ho et al., "Systematic Identification of Protein Complexes in Saccharomyces cerevisiae by Mass Spectrometry," Nature, vol. 415, no. 6868, pp. 180-183, 2002.
[38] H. Yu, A. Paccanaro, V. Trifonov, and M. Gerstein, "Predicting Interactions in Protein Networks by Completing Defective Cliques," Bioinformatics, vol. 22, no. 7, pp. 823-829, 2006.
[39] Y. Fang, W. Benjamin, M. Sun, and K. Ramani, "Global Geometric Affinity for Revealing High Fidelity Protein Interaction Network," PLoS One, vol. 6, no. 5, article e19349, 2011.
[40] J.G. Lees, J.K. Heriche, I. Morilla, J.A. Ranea, and C.A. Orengo, "Systematic Computational Prediction of Protein Interaction Networks," Physical Biology, vol. 8, article 035008, 2011.
[41] K. Ambert and A. Cohen, "K-Information Gain Scaled Nearest Neighbors: A Novel Approach to Classifying Protein-Protein Interaction-Related Documents," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 9, no. 1, pp. 305-310, Jan./Feb. 2012.
[42] J. Feng, R. Jiang, and T. Jiang, "A Max-Flow Based Approach to the Identification of Protein Complexes Using Protein Interaction and Microarray Data," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 3, pp. 621-634, May/June 2011.
[43] X. Liu, J. Li, and L. Wang, "Modeling Protein Interacting Groups by Quasi-Bicliques: Complexity, Algorithm, and Application," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 2, pp. 354-364, Apr.-June 2010.
[44] X.F. Zhang and D.Q. Dai, "A Framework for Incorporating Functional Inter-Relationships into Protein Function Prediction Algorithms," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 9, no. 3, pp. 740-753, May/June 2012.
[45] A. Kumar et al., "Protein Complexes Take the Bait," Nature, vol. 415, no. 6868, pp. 123-124, 2002.
[46] W.Q. Zhou and H. Yan, "Prediction of DNA-Binding Protein Based on Statistical and Geometric Features and Support Vector Machines," Proteome Science, vol. 9, no. Suppl. 1, article S1, 2011.
[47] W.Q. Zhou, H. Yan, X. Fan, and Q. Hao, "Prediction of Protein-Protein Interactions Using Alpha Shape Modeling," Current Bioinformatics, vol. 8, pp. 3-8, 2013.
[48] H.N. Chua, W.K. Sung, and L. Wong, "Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions," Bioinformatics, vol. 22, no. 13, pp. 1623-1630, 2006.
[49] C. Lin, D. Jiang, and A. Zhang, "Prediction of Protein Function Using Common-Neighbors in Protein-Protein Interaction Networks," Proc. IEEE Symp. BioInformatics and BioEng. (BIBE '06), pp. 251-260, 2006.
[50] M. Crovella and E. Kolaczyk, "Graph Wavelets for Spatial Traffic Analysis," Proc. IEEE INFOCOM, vol. 3, pp. 1848-1857, 2003.
[51] A. Smalter, J. Huan, and G. Lushington, "Graph Wavelet Alignment Kernels for Drug Virtual Screening," J. Bioinformatics and Computational Biology, vol. 7, no. 3, pp. 473-497, 2009.
[52] A. Bondi, "Van Der Waals Volumes and Radii," J. Physical Chemistry, vol. 68, no. 3, pp. 441-451, 1964.
[53] B. Delaunay, "Sur La Sphere Vide," Izvestia Akademii Nauk SSSR, Otdelenie Matematicheskii i Estestvennyka Nauk, vol. 7, pp. 793-800, 1934.
[54] A. Fabri and S. Pion, "CGAL: The Computational Geometry Algorithms Library," Proc. 17th ACM Int'l Conf. Advances in Geographic Information Systems, pp. 538-539, 2009.
[55] W.Q. Zhou and H. Yan, "Relationship between Periodic Dinucleotides and the Nucleosome Structure Revealed by Alpha Shape Modeling," Chemical Physics Letters, vol. 489, nos. 4-6, pp. 225-228, 2010.
[56] S. Vishveshwara, K.V. Brinda, and N. Kannan, "Protein Structure: Insights from Graph Theory," J. Theoretical and Computational Chemistry, vol. 1, no. 1, pp. 187-212, 2002.
[57] M. Tipping, "Sparse Bayesian Learning and the Relevance Vector Machine," J. Machine Learning Research, vol. 1, pp. 211-244, 2001.
[58] P. Torrione, S. Keene, and K. Morton, "PRT: The Pattern Recognition Toolbox for MATLAB," http://newfolderconsulting. comprt, 2011.
[59] A. Ben-Hur, C.S. Ong, S. Sonnenburg, B. Schölkopf, and G. Rätsch, "Support Vector Machines and Kernels for Computational Biology," PLoS Computational Biology, vol. 4, no. 10, article e1000173, 2008.
[60] A.L. Tarca, V. Carey, X. Chen, R. Romero, and S. Drăghici, "Machine Learning and Its Applications to Biology," PLoS Computational Biology, vol. 3, no. 6, article e116, 2007.
[61] V.N. Vapnik, The Nature of Statistical Learning Theory. Springer, 2000.
[62] J. Fernandez-Recio, M. Totrov, C. Skorodumov, and R. Abagyan, "Optimal Docking Area: A New Method for Predicting Protein-Protein Interaction Sites," Proteins: Structure, Function, and Bioinformatics, vol. 58, no. 1, pp. 134-143, 2005.
[63] P. Baldi, S. Brunak, Y. Chauvin, C.A.F. Andersen, and H. Nielsen, "Assessing the Accuracy of Prediction Algorithms for Classification: An Overview," Bioinformatics, vol. 16, no. 5, pp. 412-424, 2000.
[64] C. Cortes and M. Mohri, "AUC Optimization versus Error Rate Minimization," Proc. Advances in Neural Information Processing Systems, vol. 16, pp. 313-320, 2004.
53 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool