CSDL Home IEEE/ACM Transactions on Computational Biology and Bioinformatics 2011 vol.8 Issue No.05 - September/October

Subscribe

Issue No.05 - September/October (2011 vol.8)

pp: 1208-1222

David J. John , Wake Forest University, Winston-Salem

James L. Norris , Wake Forest University, Winston-Salem

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.95

ABSTRACT

Modeling of biological networks is a difficult endeavor, but exploration of this problem is essential for understanding the systems behavior of biological processes. In this contribution, developed for sparse data, we present a new continuous Bayesian graphical learning algorithm to cotemporally model proteins in signaling networks and genes in transcriptional regulatory networks. In this continuous Bayesian algorithm, the correlation matrix is singular because the number of time points is less than the number of biological entities (genes or proteins). A suitable restriction on the degree of the graph's vertices is applied and a Metropolis-Hastings algorithm is guided by a BIC-based posterior probability score. Ten independent and diverse runs of the algorithm are conducted, so that the probability space is properly well-explored. Diagnostics to test the applicability of the algorithm to the specific data sets are developed; this is a major benefit of the methodology. This novel algorithm is applied to two time course experimental data sets: 1) protein modification data identifying a potential signaling network in chondrocytes, and 2) gene expression data identifying the transcriptional regulatory network underlying dendritic cell maturation. This method gives high estimated posterior probabilities to many of the proteins' directed edges that are predicted by the literature; for the gene study, the method gives high posterior probabilities to many of the literature-predicted sibling edges. In simulations, the method gives substantially higher estimated posterior probabilities for true edges and true subnetworks than for their false counterparts.

INDEX TERMS

Biological system modeling, statistical computing, multivariate statistics, correlation and regression analysis, signal transduction networks, transcriptional regulatory networks, biological network modeling.

CITATION

David J. John, James L. Norris, "Continuous Cotemporal Probabilistic Modeling of Systems Biology Networks from Sparse Data",

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, vol.8, no. 5, pp. 1208-1222, September/October 2011, doi:10.1109/TCBB.2010.95REFERENCES

- [1] S.L. Lauritzen,
Graphical Models. Oxford Clarendon Press, 1996.- [2] S. Chaudhuri, M. Drton, and T. Richardson, “Estimation of a Covariance Matrix with Zeros,”
Biometrika, vol. 94, no. 1, pp. 199-216, Jan. 2007.- [3] H. Toh and K. Horimoto, “Inference of a Genetic Network by a Combined Approach of Cluster Analysis and Graphical Gaussian Modeling,”
Bioinformatics, vol. 6, no. 2, pp. 287-297, 2002.- [4] E. Segal, M. Shapira, A. Regev, D. Pe'er, D. Botstein, D. Koller, and N. Friedman, “Module Networks: Identifying Regulatory Modules and Their Condition-Specific Regulators from Gene Expression Data,”
Nature Genetics, vol. 34, no. 2, pp. 166-176, June 2003.- [5] N. Krämer, J. Schäfer, and A.-L. Boulesteix, “Regularized Estimation of Large-Scale Gene Association Networks Using Graphical Gaussian Models,”
BMC Bioinformatics, vol. 10, article no. 384, pp. 1-24, 2009.- [6] A. Dobra, C. Hans, B. Jones, J.R. Nevins, and M. West, “Sparse Graphical Models for Exploring Gene Expression Data,”
J. Multivariate Analysis, vol. 90, pp. 196-212, 2004.- [7] J. Schäfer and K. Strimmer, “A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics,”
Statistical Applications in Genetics and Molecular Biology, vol. 4, article no. 32, 1-30, 2005.- [8] J. Schäfer and K. Strimmer, “An Empirical Bayes Approach to Inferring Large-Scale Gene Association Networks,”
Bioinformatics, vol. 21, pp. 754-764, 2005.- [9] H. Li and J. Gai, “Gradient Directed Regularization for Sparse Gaussian, Concentration Graphs with Applications to Inference of Genetic Networks,”
Biostatistics, vol. 7, no. 2, pp. 302-317, 2008.- [10] A. de la Fuente, N. Bing, I. Hoeschele, and P. Mendes, “Discovery of Meaningful Associations in Genomic Data Using Partial Correlation Coefficients,”
Bioinformatics, vol. 20, no. 18, pp. 3565-3574, 2004.- [11] P.M. Magwene and J. Kim, “Estimating Genome Expression Networks Using First-Order Conditional Independence,”
Genome Biology, vol. 5, article no. 100, pp. 1-16, 2004.- [12] A. Wille, P. Zimmermann, E. Vranova, A. Fürholz, O. Laule, S. Bleuler, L. Hennig, A. Prelic, P. von Rohr, L. Thiele, E. Zitzler, W. Gruissem, and P. Bühlmann, “Sparse Graphical Gaussian Modeling of the Isoprenoid Gene Network in Arabidopsis Thaliana,”
Genome Biology, vol. 5, article no. 92, pp. 1-13, 2004.- [13] E.E. Allen, J.S. Fetrow, L.W. Daniel, S.J. Thomas, and D.J. John, “Algebraic Dependency Models of Protein Signal Transduction Networks from Time-Series Data,”
J. Theoretical Biology, vol. 238, no. 2, pp. 317-330, Jan. 2006.- [14] E.E. Allen, A. Pecorella, J. Fetrow, D.J. John, and W. Turkett, “Reconstructing Networks Using Co-Temporal Functions,”
Proc. 44th Ann. Assoc. for Computing Machinery Southeast Conf., pp. 417-422, Mar. 2006.- [15] D.J. John, J.S. Fetrow, and J.L. Norris, “Metropolis-Hastings Algorithm and Continuous Regression for Finding Next-State Models of Protein Modification Using Information Scores,”
Proc. Seventh Int'l Symp. Bioinformatics and Bioeng., pp. 35-41, Oct. 2007.- [16] D. Freedman, R. Pisani, and R. Purves,
Statistics, fourth ed. W.W. Norton, 2007.- [17] N. Friedman, M. Linial, I. Nachman, and D. Pe'er, “Using Bayesian Networks to Analyze Expression Data,”
J. Computational Biology, vol. 7, no. 3, pp. 601-620, 2000.- [18] N. Friedman, I. Nachman, and D. Pe'er, “Learning Bayesian Network Structures from Massive Datasets: The Sparse Candidate Algorithm,”
Proc. Conf. Uncertainty in Artificial Intelligence, 1999.- [19] D. Heckerman, D.M. Chickering, C. Meek, R. Rounthwaite, and C. Kadie, “Dependency Networks for Inference, Collaborative Filtering, and Data Visualization,”
J. Machine Learning Research, vol. 1, pp. 49-75, Oct. 2000.- [20] A. Wille and P. Bühlmann, “Low-Order Conditional Independence Graphs for Inferring Genetic Networks,”
Statistical Applications in Genetics and Molecular Biology, vol. 5, pp. 1-32, 2006.- [21] J. Neter, W. Wasserman, and M.H. Kutner,
Applied Linear Statistical Models, second ed. Irwin, 1985.- [22] S. Ma, Q. Gong, and H.J. Bohnert, “An Arabidopsis Gene Network Based on the Graphical Gaussian Model,”
Genome Research, vol. 17, pp. 1614-1625, 2007.- [23] J.E. Schmitt, R.K. Lenrat, G.L. Wallace, S. Ordez, K.N. Taylor, N. Kabani, D. Greenstein, J.P. Lerch, K.S. Kendler, M.C. Neabes, and I.N. Gredd, “Identification of Genetically Medrated Cortical Networks: A Multivariate Study of Pediatric Twins and Siblings,”
Cerebral Cortex, vol. 18, no. 8, pp. 1737-1747, 2008.- [24] R.A. Johnson and D.W. Wichern,
Applied Multivariate Statistical Analysis. Prentice-Hall, 1982.- [25] J. Pearl,
Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 1988.- [26] E. Segal, D. Pe'er, A. Regev, D. Koller, and N. Friedman, “Learning Module Networks,”
J. Machine Learning Research, vol. 6, pp. 557-588, 2005.- [27] G. Schwarz, “Estimating the Dimension of a Model,”
Annals of Statistics, vol. 6, pp. 461-464, 1978.- [28] D. Edwards,
Introduction to Graphical Modelling, second ed. Springer-Verlag, 2000.- [29] I. Nachman, A. Regev, and N. Friedman, “Inferring Quantitative Models of Regulatory Networks from Expression Data,”
Bioinformatics, vol. 20, pp. 1248-1256, 2004.- [30] A.E. Raftery, “Bayesian Model Selection in Social Research,”
Sociological Methodology, P.V. Marsden, ed., pp. 111-195, Blackwell, 1995.- [31] J.A. Hoeting, D. Madigan, A.E. Raftery, and C.T. Volinsky, “Bayesian Model Averaging: A Tutorial (with comments by M. Clyde, David Draper, and E.I. George, and a rejoinder by the authors),”
Statistical Science, vol. 14, no. 4, pp. 382-417, 1999.- [32] J. Pearl and T.S. Verma, “A Theory of Inferred Causation,”
Proc. Second Int'l Conf. Principles of Knowledge Representation and Reasoning, 1991.- [33] P. Spirtes, C. Glymour, and R. Scheines,
Causation, Prediction, and Search. Springer-Verlag, 1993.- [34] N. Friedman and D. Koller, “Being Bayesian about Network Structure: A Bayesian Approach to Structure Discovery in Bayesian Networks,”
Machine Learning, vol. 50, pp. 95-126, 2003.- [35] D.M. Chickering, “Learning Equivalence Classes of Bayesian Network Structures,”
Proc. 12th Conf. Uncertainty in Artificial Intelligence. pp. 150-157, 1996.- [36] W.K. Hastings, “Monte Carlo Sampling Methods Using Markov Chains and Their Applications,”
Biometrika, vol. 57, pp. 97-109, 1970.- [37] S. Chib and E. Greenberg, “Understanding the Metropolis-Hastings Algorithm,”
The Am. Statistician, vol. 49, no. 4, pp. 327-335, Nov. 1995.- [38] D. Madigan and A.E. Raftery, “Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window,”
J. Am. Statistical Assoc., vol. 89, pp. 1535-1546, 1994.- [39] C.A. Bonjardim, “Interferons (Ifns) are Key Cytokines in Both Innate and Adaptive Antiviral Immune Responses-and Viruses Counteract Ifn Action,”
Microbes and Infection, vol. 7, no. 3, pp. 569-578, 2005.- [40] W. Mendenhall and T. Sincich,
A Second Course in Statistics: Regression Analysis. Prentice-Hall, 1996.- [41] B.G. Starkman, J.D. Cravero, M. Delcarlo, and R.F. Loeser, “IGF-1 Stimulation of Proteoglycan Synthesis by Chondrocytes Requires Activation of the PI 3-Kinase Pathway but Not ERK MAPK,”
Biochemical J., vol. 389, no. 3, pp. 723-729, Aug. 2005.- [42] A.A. Butler, S. Yakar, I.H. Gewolb, M. Karas, Y. Okubo, and D. LeRoith, “Insulin-Like Growth Factor-1 Receptor Signal Transduction: At the Interface between Physiology and Cell Biology,”
Comparative Biochemistry Physiology—Part B: Biochemistry and Molecular Biology, vol. 121, no. 1, pp. 19-26, Sept. 1998.- [43] R. Baserga, A. Hongo, M. Rubini, M. Prisco, and B. Valentinis, “The IGF-1 Receptor in Cell Growth,”
Biochimica et Biophysica Acta, vol. 1332, no. 3, pp. 105-125, June 1997.- [44] N.R. Gough, “Science's Signal Transduction Knowledge Environment: The Connections Map Database,”
Annals of the New York Academy of Sciences, vol. 971, pp. 585-587, 2002.- [45] M.F. White, “Insulin Signaling Pathway,” Science's STKE (Connections Map), cmp_12069, 2007.
- [46] K.H. Martin, J.K. Slack, S.A. Boerner, C.C. Martin, and J.T. Parsons, “Integrin Signaling Pathway,” Science's STKE (Connections Map), cmp_12069, 2002.
- [47] J. Schlessinger, “Fibroblast Growth Factor Receptor Pathway,” Science's STKE (Connections Map), cmp_15049, 2008.
- [48] J. Schlessinger, “Epidermal Growth Factor Receptor Pathway,” Sci. STKE (Connections Map), cmp_14987, 2008.
- [49] A. Takaoka and H. Yanai, “Interferon Signalling Network in Innate Defence,”
Cell Microbiology, vol. 8, no. 6, pp. 907-922, 2006.- [50] J.J. Filliben, “The Probability Plot Correlation Coefficient Test for Normality,”
Technometrics, vol. 17, no. 1, pp. 111-117, Feb. 1975.- [51] G.W. Snedecor and W.G. Cochran,
Statistical Methods, sixth ed. The Iowa State Univ. Press, 1967.- [52] T. Heeren and R. DíAgostino, “Robustness of the Two Independent Samples $t$ -Test When Applied to Ordinal Scaled Data,”
Statistics in Medicine, vol. 6, no. 1, pp. 79-90, 2006.- [53] B.-H. Mevik and R. Wehrens, “The pls Package: Principal Component and Partial Least Squares Regression in R,”
J. Statistical Software, vol. 18, no. 2, pp. 1-24, http://www.jstatsoft. org/v18i02, 2007.- [54] A.E. Raftery, D. Madigan, and J.A. Hoeting, “Bayesian Model Averaging for Linear Regression Models,”
J. Am. Statistical Assoc., vol. 92, no. 437, pp. 179-191, 1997.- [55] M.H. DeGroot and M.J. Schervish,
Probability and Statistics. Addison-Wesley, 2002.- [56] A.L. Olex, E.M. Hilbold, X. Leng, and J.S. Fetrow, “Dynamics of Dendritic Cell Maturation Are Identified through A Novel Filtering Strategy Applied to Biological Time-Course Microarray Replicates,”
BMC Immunology, vol. 11, article no. 41, pp. 1-19, Aug. 2010. |