• Publication
  • 2012
  • Issue No. 4 - July-Aug.
  • Abstract - Predicting Ligand Binding Residues and Functional Sites Using Multipositional Correlations with Graph Theoretic Clustering and Kernel CCA
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Predicting Ligand Binding Residues and Functional Sites Using Multipositional Correlations with Graph Theoretic Clustering and Kernel CCA
July-Aug. 2012 (vol. 9 no. 4)
pp. 992-1001
Li Liao, Comput. & Inf. Sci. Dept., Univ. of Delaware, Newark, DE, USA
A. J. Gonzalez, Comput. & Inf. Sci. Dept., Univ. of Delaware, Newark, DE, USA
C. H. Wu, Center for Bioinf. & Comput. Biol., Univ. of Delaware, Newark, DE, USA
We present a new computational method for predicting ligand binding residues and functional sites in protein sequences. These residues and sites tend to be not only conserved, but also exhibit strong correlation due to the selection pressure during evolution in order to maintain the required structure and/or function. To explore the effect of correlations among multiple positions in the sequences, the method uses graph theoretic clustering and kernel-based canonical correlation analysis (kCCA) to identify binding and functional sites in protein sequences as the residues that exhibit strong correlation between the residues' evolutionary characterization at the sites and the structure-based functional classification of the proteins in the context of a functional family. The results of testing the method on two well-curated data sets show that the prediction accuracy as measured by Receiver Operating Characteristic (ROC) scores improves significantly when multipositional correlations are accounted for.

[1] N. Hulo, A. Bairoch, V. Bulliard, L. Cerutti, E.D. Castro, P.S. Langendijk-Genevaux, M. Pagni, and C.J. Sigrist, "The Prosite Database," Nucleic Acids Research, vol. 34, pp. D227-D230, 2006.
[2] J.A. Capra and M. Singh, "Predicting Functionally Important Residues from Sequence Conservation," Bioinformatics, vol. 23, no. 15, pp. 1875-1882, 2007.
[3] S. Jones and J.M. Thornton, "Searching for Functional Sites in Protein Structures," Current Opinion Chemical Biology, vol. 8, no. 1, pp. 3-7, 2004.
[4] N.V. Petrova and C.H. Wu, "Prediction of Catalytic Residues Using Support Vector Machine with Selected Protein Sequence and Structural Properties," BMC Bioinformatics, vol. 7, article 312, 2006.
[5] F. Pazos, A. Rausell, and A. Valencia, "Phylogeny-Independent Detection of Functional Residues," Bioinformatics, vol. 22, no. 12, pp. 1440-1448, 2006.
[6] A.J. González, L. Liao, and C.H. Wu, "Predicting Functional Sites in Biological Sequences Using Canonical Correlation Analysis," Proc. Int'l Conf. Bioinformatics and Computational Biology (BIOCOMP '10), July 2010.
[7] S. Akaho, "A Kernel Method for Canonical Correlation Analysis," Proc. Int'l Meeting of the Psychometric Soc. (IMPS '01), July 2001.
[8] F.R. Bach and M.I. Jordan, "Kernel Independent Component Analysis," J. Machine Learning Research, vol. 3, pp. 1-48, 2002.
[9] Y. Yamanishi, J.P. Vert, A. Nakaya, and M. Kanehisa, "Extraction of Correlated Gene Clusters from Multiple Genomic Data by Generalized Kernel Canonical Correlation Analysis," Bioinformatics, vol. 19, pp. i323-i330, 2003.
[10] H. Saigo, J.P. Vert, N. Ueda, and T. Akutsu, "Protein Homology Detection Using String Alignment Kernels," Bioinformatics, vol. 20, no. 11, pp. 1682-1689, 2004.
[11] A.J. González, L. Liao, and C.H. Wu, "Predicting Ligand Binding Residues Using Multi-Positional Correlations and Kernel Canonical Correlation Analysis," Proc. IEEE Int'l Conf. Bioinformatics and Biomedicine (BIBM '10), Dec. 2010.
[12] B. Song, J.H. Choi, G. Chen, J. Szymanski, G.Q. Zhang, A.K.H. Tung, J. Kang, S. Kim, and J. Yang, "Arcs: An Aggregated Related Column Scoring Scheme for Aligned Sequences," Bioinformatics, vol. 22, no. 19, pp. 2326-2332, 2006.
[13] D. Eppstein, M. Loffler, and D. Strash, "Listing All Maximal Cliques in Sparse Graphs in Near-Optimal Time," Proc. 21st Int'l Symp. Algorithms and Computation (ISAAC '10), Part 1, LNCS, vol. 6506, pp. 403-414, Springer, 2010.
[14] N. Chiba and T. Nishizeki, "Arboricity and Subgraph Listing Algorithms," SIAM J. Computing, vol. 14, no. 1, pp. 210-223, 1985.
[15] L. Holm and C. Sander, "Protein Structure Comparison by Alignment of Distance Matrices," J. Molecular Biology, vol. 233, pp. 123-138, 1993.
[16] J.D. Holliday, C.Y. Hu, and P. Willett, "Grouping of Coefficients for the Calculation of Inter-Molecular Similarity and Dissimilarity Using 2D Fragment Bit-Strings," Combinatorial Chemistry and High Throughput Screen, vol. 5, no. 2, pp. 155-166, 2002.
[17] D.S. Goodsell, "P53 Tumor Suppressor," RCSB PDB Molecule of the Month, http://dx.doi.org/10.2210/rcsb_pdbmom_2002_7 , July 2002.
[18] D.S. Goodsell, "Zinc Fingers," RCSB PDB Molecule of the Month, http://dx.doi.org/10.2210/rcsb_pdbmom_2007_3 , Mar. 2007.
[19] L. Holm and P. Rosenstrom, "Dali Server: Conservation Mapping in 3D," Nucleic Acids Research, vol. 38, pp. W545-W549, 2010.
[20] M.S. Johnson and J.P. Overington, "A Structural Basis for Sequence Comparisons. An Evaluation of Scoring Methodologies," J. Molecular Biology, vol. 233, no. 4, pp. 716-738, 1993.

Index Terms:
proteins,biology computing,evolutionary computation,graph theory,molecular biophysics,molecular configurations,receiver operating characteristic score,ligand binding residues,multipositional correlations,graph theoretic clustering,kernel-based canonical correlation analysis,computational method,protein sequences,evolution,structure-based functional classification,Correlation,Proteins,Kernel,Amino acids,Bioinformatics,Eigenvalues and eigenfunctions,Computational biology,cliques.,Functional residues,specificity determining positions,multiple sequence alignments,kernel canonical correlation analysis
Citation:
Li Liao, A. J. Gonzalez, C. H. Wu, "Predicting Ligand Binding Residues and Functional Sites Using Multipositional Correlations with Graph Theoretic Clustering and Kernel CCA," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 992-1001, July-Aug. 2012, doi:10.1109/TCBB.2011.136
Usage of this product signifies your acceptance of the Terms of Use.