CSDL Home IEEE/ACM Transactions on Computational Biology and Bioinformatics 2008 vol.5 Issue No.01 - January-March

Subscribe

Issue No.01 - January-March (2008 vol.5)

pp: 42-55

ABSTRACT

When accounting for structural fluctuations or measurement errors, a single rigid structure may not be sufficient to represent a protein. One approach to solve this problem is to represent the possible conformations as a discrete set of observed conformations, an ensemble. In this work, we follow a different richer approach, and introduce a framework for estimating probability density functions in very high dimensions, and then apply it to represent ensembles of folded proteins. This proposed approach combines techniques such as kernel density estimation, maximum likelihood, cross-validation, and bootstrapping. We present the underlying theoretical and computational framework and apply it to artificial data and protein ensembles obtained from molecular dynamics simulations. We compare the results with those obtained experimentally, illustrating the potential and advantages of this representation.

INDEX TERMS

protein ensembles, density estimation, Bayesian networks, graphical models, maximum likelihood, cross-validation, bootstrapping

CITATION

Guillermo Sapiro, Diego Rother, "Statistical Characterization of Protein Ensembles",

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, vol.5, no. 1, pp. 42-55, January-March 2008, doi:10.1109/TCBB.2007.1061REFERENCES

- [2] N. Furnham, T.L. Blundell, M.A. DePristo, and T.C. Terwilliger, “Correspondence: Is One Solution Good Enough,”
Nature Structural and Molecular Biology, vol. 13, pp. 184-185, Mar. 2006.- [3] A.Y. Grosberg and A.R. Khoklov,
Statistical Physics of Macromolecules. AIP Press, 1994.- [4] J.E. Kohn, I.S. Millett, J. Jacob, B. Zagrovic, T.M. Dillon, N. Cingel, R.S. Dothager, S. Seifert, P. Thiyagarajan, T.R. Sosnick, M.Z. Hasan, V.S. Pande, I. Ruczinski, S. Doniach, and K.W. Plaxco, “Random-Coil Behavior and the Dimensions of Chemically Unfolded Proteins,”
Proc. Nat'l Academy of Sciences, vol. 101, pp.12491-12496, 2004.- [9] V.S. Pande, Folding@Home Distributed Computing, Stanford Univ., http:/folding.stanford.edu/, 2005.
- [10] D. Baker, The Baker Laboratory, http:/www.bakerlab.org/, 2003.
- [12] C. Branden and J. Tooze,
Introduction to Protein Structure. Garland Publishing, 1998.- [13] D. Rother, G. Sapiro, and V. Pande, “Statistical Characterization of Protein Ensembles,”
Proc. Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB '05), pp. 297-298, 2005.- [14] R.E. Neapolitan,
Learning Bayesian Networks, p. 674. Pearson Prentice Hall, 2004.- [15] M.I. Jordan,
Learning in Graphical Models. Kluwer Academic Publishers, 1998.- [16] M. Teyssier and D. Koller, “Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks,”
Proc. Conf. Uncertainty in Artificial Intelligence (UAI '05), pp. 584-590, 2005.- [18] G. Seroussi, personal communication, 2006.
- [19] H. Akaike, “A New Look at the Statistical Model Identification,”
IEEE Trans. Automatic Control, vol. 19, pp. 716-723, 1974.- [20] K.P. Burnham and D.R. Anderson,
Model Selection and Inference: A Practical Information—Theoretic Approach, p. 353. Springer, 1998.- [21] J. Rissanen,
Stochastic Complexity in Statistical Inquiry. World Scientific, 1989.- [22] T. Hastie, R. Tibshirani, and J. Friedman,
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, p. 536. Springer, 2001.- [23] S.M. Kay,
Fundamentals of Statistical Signal Processing: Estimation Theory, vol. 1. Prentice Hall, 1993.- [24] P.A. Viola, “Alignment by Maximization of Mutual Information,” PhD dissertation, MIT, 1995.
- [25] T.M. Cover and J.A. Thomas,
Elements of Information Theory. John Wiley & Sons, 1991.- [26] B.W. Silverman,
Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.- [27] K.V. Mardia and P.E. Jupp,
Directional Statistics. John Wiley & Sons, 2000.- [28] B. Efron and R.J. Tibshirani,
An Introduction to the Bootstrap. Chapman and Hall, 1993.- [30] C.D. Snow, L. Qiu, D. Du, F. Gai, S.J. Hagen, and V.S. Pande, “Trp Zipper Folding Kinetics by Molecular Dynamics and Temperature—Jump Spectroscopy,”
Proc. Nat'l Academy of Sciences, vol. 101, pp. 4077-4082, 2004.- [32] C.L. Brooks, M. Karplus, and B. Montgomery Pettitt,
Proteins: A Theoretical Perspective of Dynamics, Structure, and Thermodynamics, vol. 71, p. 259. Wiley-Interscience, 1988.- [34] J. McKnight McKnight Lab PDB Files, http://people.bu.edu/cjmckpdb.htm, 2005.
- [36] A.G. Gray and A.W. Moore, “Nonparametric Density Estimation: Toward Computational Tractability,”
Proc. SIAM Int'l Conf. Data Mining, 2003.- [37] J. Beirlant, E.J. Dudewicz, L. Györfi, and E.C. Van der Meulen, “Nonparametric Entropy Estimation: An Overview,”
Int'l J. Math. and Statistical Sciences, vol. 6, pp. 17-40, 1997. |