The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - April-June (2008 vol.5)
pp: 183-197
ABSTRACT
Many statistical measures and algorithmic techniqueshave been proposed for studying residue coupling inprotein families. Generally speaking, two residue positions areconsidered coupled if, in the sequence record, some of theiramino acid type combinations are significantly more commonthan others. While the proposed approaches have proven useful infinding and describing coupling, a significant missing componentis a formal probabilistic model that explicates and compactlyrepresents the coupling, integrates information about sequence,structure, and function, and supports inferential procedures foranalysis, diagnosis, and prediction.We present an approach to learning and using probabilisticgraphical models of residue coupling. These models capturesignificant conservation and coupling constraints observable ina multiply-aligned set of sequences. Our approach can place astructural prior on considered couplings, so that all identifiedrelationships have direct mechanistic explanations. It can alsoincorporate information about functional classes, and therebylearn a differential graphical model that distinguishes constraintscommon to all classes from those unique to individual classes.Such differential models separately account for class-specificconservation and family-wide coupling, two different sourcesof sequence covariation. They are then able to perform interpretablefunctional classification of new sequences, explainingclassification decisions in terms of the underlying conservationand coupling constraints. We apply our approach in studies ofboth G protein-coupled receptors and PDZ domains, identifyingand analyzing family-wide and class-specific constraints, andperforming functional classification. The results demonstrate thatgraphical models of residue coupling provide a powerful toolfor uncovering, representing, and utilizing significant sequencestructure-function relationships in protein families.
INDEX TERMS
Correlated mutations, graphical models, evolutionary covariation, sequence-structure-function relationships, functional classification
CITATION
Naren Ramakrishnan, John Thomas, "Graphical Models of Residue Coupling in Protein Families", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.5, no. 2, pp. 183-197, April-June 2008, doi:10.1109/TCBB.2007.70225
REFERENCES
[1] A. Armon, D. Graur, and N. Ben-Tal, “ConSurf: An Algorithmic Tool for the Identification of Functional Regions in Proteins by Surface Mapping of Phylogenetic Information,” J. Molecular Biology, vol. 307, pp. 447-463, 2001.
[2] W.R. Atchley, W. Terhalle, and A. Dress, “Positional Dependence, Cliques, and Predictive Motifs in the bHLH Protein Domain,” J.Molecular Evolution, vol. 48, pp. 501-516, 1999.
[3] T. Beuming, L. Skrabanek, M.Y. Niv, P. Mukherjee, and H. Weinstein, “PDZBase: A Protein-Protein Interaction Database for PDZ-Domains,” Bioinformatics, vol. 21, no. 6, pp. 827-828, 2005.
[4] M. Bhasin and G.P.S. Raghava, “GPCRpred: An SVM-Based Method for Prediction of Families and Subfamilies of G-Protein Coupled Receptors,” Nucleic Acids Research, vol. 32, pp. 383-389, 2004.
[5] W.L. Buntine, “Operations for Learning with Graphical Models,” J. Artificial Intelligence Research, vol. 2, pp. 159-225, 1994.
[6] M.W. Dimmic, M.J. Hubisz, C.D. Bustamante, and R. Nielsen, “Detecting Coevolving Amino Acid Sites Using Bayesian Mutational Mapping,” Bioinformatics, vol. 21, no. S1, pp. i126-i135, 2005.
[7] R. Durbin, S.R. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge Univ. Press, 1998.
[8] A.A. Fodor and R.W. Aldrich, “Influence of Conservation on Calculations of Amino Acid Covariance in Multiple Sequence Alignments,” Proteins: Structure, Function, and Bioinformatics, vol. 56, pp. 211-221, 2004.
[9] U. Göbel, C. Sander, R. Schneider, and A. Valencia, “Correlated Mutations and Residue Contacts in Proteins,” Proteins: Structure, Function, and Genetics, vol. 18, no. 4, pp. 309-317, 1994.
[10] I.V. Grigoriev and S.-H. Kim, “Detection of Protein Fold Similarity Based on Correlation of Amino Acid Properties,” Proc. Nat'l Academy of Sciences, vol. 96, no. 25, pp. 14318-14323, Dec. 1999.
[11] F. Horn, G. Vriend, and F.E. Cohen, “Collecting and Harvesting Biological Data: The GPCRDB and NucleaRDB Databases,” Nucleic Acids Research, vol. 29, no. 1, pp. 346-349, 2001.
[12] W. Humphrey, A. Dalke, and K. Schulten, “VMD—Visual Molecular Dynamics,” J. Molecular Graphics, vol. 14, pp. 33-38, 1996.
[13] A.Y. Hung and M. Sheng, “PDZ Domains: Structural Modules for Protein Complex Assembly,” J. Biological Chemistry, vol. 277, no. 8, pp. 5699-5702, Feb. 2002.
[14] R. Karchin, K. Karplus, and D. Haussler, “Classifying G-Protein Coupled Receptors with Support Vector Machines,” Bioinformatics, vol. 18, no. 1, pp. 147-159, 2002.
[15] K. Karplus, “Regularizers for Estimating Distributions of Amino Acids from Small Samples,” technical report, Computer Eng. and Information Sciences, Univ. of California, Mar. 1995.
[16] I. Kass and A. Horovitz, “Mapping Pathways of Allosteric Communication in GroEL by Analysis of Correlated Mutations,” Proteins: Structure, Function, and Genetics, vol. 48, pp. 611-617, 2002.
[17] B.T.M. Korber, R.M. Farber, D.H. Wolpert, and A.S. Lapedes, “Covariation of Mutations in the V3 Loop of HIV Type 1 Envelope Protein: An Information Theoretic Analysis,” Proc. Nat'l Academy of Sciences, vol. 90, pp. 7176-7180, Aug. 1993.
[18] S. Lauritzen, Graphical Models. Oxford Univ. Press, 1996.
[19] J. Li, P.C. Edwards, M. Burghammer, C. Villa, and G.F. Schertler, “Structure of Bovine Rhodopsin in a Trigonal Crystal Form,” J.Molecular Biology, vol. 343, no. 5, pp. 1409-1438, Nov. 2004.
[20] O. Lichtarge, H.R. Bourne, and F.E. Cohen, “An Evolutionary Trace Method Defines Binding Surfaces Common to Protein Families,” J. Molecular Biology, vol. 257, pp. 342-358, 1996.
[21] A.H. Liu and A. Califano, “CASTOR: Clustering Algorithm for Sequence Taxonomical Organization and Relationships,” J. Computational Biology, vol. 10, no. 1, pp. 21-45, 2003.
[22] A.H. Liu, X. Zhang, G.A. Stolovitzky, A. Califano, and S.J. Firestein, “Motif-Based Construction of a Functional Map for Mammalian Olfactory Receptors,” Genomics, vol. 81, no. 5, pp. 443-456, 2003.
[23] S.W. Lockless and R. Ranganathan, “Evolutionarily Conserved Pathways of Energetic Connectivity in Protein Families,” Science, vol. 286, no. 5438, pp. 295-299, Oct. 1999.
[24] M. Milik, S. Szalma, and K.A. Olszewski, “Common Structural Cliques: A Tool for Protein Structure and Function Analysis,” Protein Eng., vol. 16, no. 8, pp. 542-552, 2003.
[25] O. Noivirt, M. Eisenstein, and A. Horovitz, “Detection and Reduction of Evolutionary Noise in Correlated Mutation Analysis,” Protein Eng., vol. 18, no. 5, pp. 247-253, 2005.
[26] L. Oliveira, A.C.M. Paiva, and G. Vriend, “Correlated Mutation Analyses on Very Large Sequence Families,” Chembiochem, vol. 3, pp. 1010-1017, 2002.
[27] O. Olmea, B. Rost, and A. Valencia, “Effective Use of Sequence Correlation and Conservation in Fold Recognition,” J. Molecular Biology, vol. 293, pp. 1221-1239, 1999.
[28] M. Pagel, “Detecting Correlated Evolution on Phylogenies: A General Method for the Comparative Analysis of Discrete Characters,” Proc. Biological Sciences, vol. 255, no. 1342, pp. 37-45, 1994.
[29] F. Pazos, M. Helmer-Citterich, G. Ausiello, and A. Valencia, “Correlated Mutations Contain Information about Protein-Protein Interaction,” J. Molecular Biology, vol. 271, pp. 511-523, 1997.
[30] D.D. Pollock and W.R. Taylor, “Effectiveness of Correlation Analysis in Identifying Protein Residues Undergoing Correlated Evolution,” Protein Eng., vol. 10, pp. 647-657, 1997.
[31] D.D. Pollock, W.R. Taylor, and N. Goldman, “Coevolving Protein Residues: Maximum Likelihood Identification and Relationship to Structure,” J. Molecular Biology, vol. 287, pp. 187-198, 1999.
[32] W.P. Russ, D.M. Lowery, P. Mishra, M.B. Yaffee, and R. Ranganathan, “Natural-Like Function in Artificial WW Domains,” Nature, vol. 437, pp. 579-583, 2005.
[33] L. Saftalov, P.A. Smith, A.M. Friedman, and C. Bailey-Kellogg, “Site-Directed Combinatorial Construction of Chimaeric Genes: General Method for Optimizing Assembly of Gene Fragments,” Proteins: Structure, Function, and Bioinformatics, vol. 64, no. 3, pp.629-642, Aug. 2006.
[34] O. Schueler-Furman and D. Baker, “Conserved Residue Clustering and Protein Structure Prediction,” Proteins: Structure, Function, and Genetics, vol. 52, pp. 225-235, 2003.
[35] M. Socolich, S.W. Lockless, W.P. Russ, H. Lee, K.H. Gardner, and R. Ranganathan, “Evolutionary Information for Specifying a Protein Fold,” Nature, vol. 437, pp. 512-518, 2005.
[36] G.S. Suel, S.W. Lockless, M.A. Wall, and R. Ranganathan, “Evolutionary Conserved Networks of Residues Mediate Allosteric Communication in Proteins,” Nature Structural Biology, vol. 10, no. 1, pp. 59-69, Jan. 2003.
[37] J. Thomas, N. Ramakrishnan, and C. Bailey-Kellogg, “Graphical Models of Residue Coupling in Protein Families,” Proc. Fifth ACM SIGKDD Workshop Data Mining in Bioinformatics (BIOKDD '05), pp.12-20, 2005.
[38] X. Ye, A.M. Friedman, and C. Bailey-Kellogg, “Hypergraph Model of Multi-Residue Interactions in Proteins: Sequentially-Constrained Partitioning Algorithms for Optimization of Site-Directed Protein Recombination,” Proc. Int'l Conf. Research in Computational Molecular Biology (RECOMB '06), pp. 15-29, 2006.
22 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool