CSDL Home IEEE/ACM Transactions on Computational Biology and Bioinformatics 2011 vol.8 Issue No.03 - May/June

Subscribe

Issue No.03 - May/June (2011 vol.8)

pp: 683-697

Seiya Imoto , University of Tokyo, Tokyo

Hiromitsu Araki , Cell Innovator, Inc

Masao Nagasaki , University of Tokyo, Tokyo

Cristin Print , University of Auckland, Auckland

D. Stephen Charnock-Jones , University of Cambridge, Cambridge

Yoshinori Tamada , University of Tokyo, Tokyo

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.68

ABSTRACT

We present a novel algorithm to estimate genome-wide gene networks consisting of more than 20,000 genes from gene expression data using nonparametric Bayesian networks. Due to the difficulty of learning Bayesian network structures, existing algorithms cannot be applied to more than a few thousand genes. Our algorithm overcomes this limitation by repeatedly estimating subnetworks in parallel for genes selected by neighbor node sampling. Through numerical simulation, we confirmed that our algorithm outperformed a heuristic algorithm in a shorter time. We applied our algorithm to microarray data from human umbilical vein endothelial cells (HUVECs) treated with siRNAs, to construct a human genome-wide gene network, which we compared to a small gene network estimated for the genes extracted using a traditional bioinformatics method. The results showed that our genome-wide gene network contains many features of the small network, as well as others that could not be captured during the small network estimation. The results also revealed master-regulator genes that are not in the small network but that control many of the genes in the small network. These analyses were impossible to realize without our proposed algorithm.

INDEX TERMS

Biology and genetics, gene networks, Bayesian network structure learning, gene expression data analysis.

CITATION

Seiya Imoto, Hiromitsu Araki, Masao Nagasaki, Cristin Print, D. Stephen Charnock-Jones, Yoshinori Tamada, "Estimating Genome-Wide Gene Networks Using Nonparametric Bayesian Network Models on Massively Parallel Computers",

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, vol.8, no. 3, pp. 683-697, May/June 2011, doi:10.1109/TCBB.2010.68REFERENCES

- [1] A.L. Barabási and Z.N. Oltvai, "Network Biology: Understanding the Cell's Functional Organization,"
Nature Rev. Genetics, vol. 5, no. 10, pp. 101-113, 2004.- [2] E.P. van Someren, L.F.A. Wesssels, E. Backer, and M.J.T. Reinders, "Genetic Network Modeling,"
Pharmacogenomics, vol. 3, pp. 507-525, 2002.- [3] X. Zhu, M. Gerstein, and M. Snyder, "Getting Connected: Analysis and Principles of Biological Networks,"
Genes & Development, vol. 21, pp. 1010-1024, 2007.- [4] N. Friedman, M. Linial, I. Nachman, and D. Péer, "Using Bayesian Networks to Analyze Expression Data,"
J. Computational Biology, vol. 7, pp. 601-620, 2000.- [5] S. Imoto, T. Goto, and S. Miyano, "Estimation of Genetic Networks and Functional Structures between Genes by Using Bayesian Networks and Nonparametric Regression,"
Proc. Pacific Symp. Biocomputing, vol. 7, pp. 175-186, 2002.- [6] J.M. Peña, J. Björkegren, and J. Tegnér, "Growing Bayesian Network Models of Gene Networks from Seed Genes,"
Bioinformatics, vol. 21, pp. ii224-ii229, 2005.- [7] J. Yu, V.A. Smith, P.P. Wang, A.J. Hartemink, and E.D. Jarvis, "Advances to Bayesian Network Inference for Generating Causal Networks from Observational Biological Data,"
Bioinformatics, vol. 20, no. 18, pp. 3594-3603, 2004.- [8] X.-W. Chen, G. Anantha, and X. Wang, "An Effective Structure Learning Method for Constructing Gene Networks,"
Bioinformatics, vol. 22, no. 11, pp. 1367-1374, 2006.- [9] D.M. Chickering, "Learning Bayesian Network is NP Complete,"
Learning from Data: Artificial Intelligence and Statisitcs V, D. Fisher and H.-J. Lenz, eds., pp. 121-130, Springer-Verlag, 1996.- [10] S. Ott, S. Imoto, and S. Miyano, "Finding Optimal Models for Small Gene Networks,"
Proc. Pacific Symp. Biocomputing, vol. 9, pp. 557-567, 2004.- [11] E. Perrier, S. Imoto, and S. Miyano, "Finding Optimal Bayesian Network Given a Super-Structure,"
J. Machine Learning Research, vol. 9, pp. 2251-2286, 2008.- [12] N. Friedman, I. Nachman, and D. Péer, "Learning Bayesian Network Structure from Massive Datasets: The 'Sparse Candidate' Algorithm,"
Proc. 15th Ann. Conf. Uncertainty in Artificial Intelligence, pp. 206-215, 1999.- [13] S. Imoto, Y. Tamada, H. Araki, K. Yasuda, C.G. Print, D.S. Charnock-Jones, D. Sanders, C.J. Savoie, K. Tashiro, S. Kuhara, and S. Miyano, "Computational Strategy for Discovering Druggable Gene Networks from Genome-Wide RNA Expression Profiles,"
Proc. Pacific Symp. Biocomputing, vol. 11, pp. 559-571, 2006.- [14] Y. Tamada, H. Araki, S. Imoto, M. Nagasaki, A. Doi, Y. Nakanishi, Y. Tomiyasu, K. Yasuda, B. Dunmore, D. Sanders, S. Humphreys, C. Print, D.S. Charnock-Jones, K. Tashiro, S. Kuhara, and S. Miyano, "Unraveling Dynamic Activities of Autocrine Pathways that Control Drug-Response Transcriptome Networks,"
Proc. Pacific Symp. Biocomputing, vol. 14, pp. 251-263, 2009.- [15] J. Pearl,
Causality. Springer, 2000.- [16] G. Maldonado and S. Greenland, "Estimating Causal Effects,"
Int'l J. Epidemiology, vol. 31, no. 2, pp. 422-429, 2002.- [17] V.G. Tusher, R. Tibshirani, and G. Chu, "Significance Analysis of Microarrays Applied to the Ionizing Radiation Response,"
Proc. Nat'l Academy of Sciences USA, vol. 98, no. 9, pp. 5116-5121, 2001.- [18] C. de Boor,
A Practical Guide to Splines. Springer-Verlag, 1978.- [19] R. Opgen-Rhein and K. Strimmer, "Learning Causal Networks from Systems Biology Time Course Data: An Effective Model Selection Procedure for the Vector Autoregressive Process,"
BMC Bioinformatics, vol. 8, p. S3, 2007.- [20] D.J. Pearce and P.H. Kelly, "A Dynamic Topological Sort Algorithm for Directed Acyclic Graphs,"
ACM J. Experimental Algorithmics, vol. 11, no. 1.7, pp. 1-24, 2006.- [21] D. Ajwani, T. Friedrich, and U. Meyer, "An $o(n^{2.75})$ Algorithm for Online Topological Ordering,"
Proc. 10th Scandinavian Workshop Algorithm Theory (SWAT '06), pp. 53-64, 2006.- [22] I. Tsamardinos, L.E. Brown, and C.F. Aliferis, "The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm,"
Machine Learning, vol. 65, pp. 31-78, 2006.- [23] I.A. Beinlich, H.J. Suermondt, R.M. Chavez, and G.F. Cooper, "The ALARM Monitoring System: A Case Study with Two Probabilistic Inference Techniques for Belief Networks,"
Proc. Second European Conf. Artificial Intelligence in Medicine, vol. 38, pp. 247-256, 1989.- [24] Y. Tamada, S. Imoto, K. Tashiro, S. Kuhara, and S. Miyano, "Identifying Drug Active Pathways from Gene Networks Estimated by Gene Expression Data,"
Genome Informatics, vol. 16, no. 1, pp. 182-191, 2005.- [25] http://www.hgc.jp/~tamada/suppl/GWGNindex.html , 2009.
- [26] P.H.C. Eilers and B.D. Marx, "Flexible Smoothing with $B$ -Splines and Penalties,"
Statistical Science, vol. 11, no. 2, pp. 89-121, 1996.- [27] T.H. Bø, B. Dysvik, and I. Jonassen, "LSimpute: Accurate Estimation of Missing Values in Microarray Data with Least Squares Methods,"
Nucleic Acids Research, vol. 32, no. 3, p. e34, 2004.- [28] http://www.geneontology.orgGO.slims.shtml , 2010.
- [29] M. Karin and Y. Ben-Neriah, "Phosphorylation Meets Ubiquitination: The Control of NF-$\kappa$ B Activity,"
Ann. Rev. Immunology, vol. 18, pp. 621-663, 2000.- [30] S.A. Marsters, J.P. Sheridan, C.J. Donahue, R.M. Pitti, C.L. Gray, A.D. Goddard, K.D. Bauer, and A. Ashkenazi, "Apo-3, A New Member of the Tumor Necrosis Factor Receptor Family, Contains a Death Domain and Activates Apoptosis and NF-$\kappa$ B,"
Current Biology, vol. 6, no. 12, pp. 1669-1676, 1996.- [31] T. Okayasu, A. Tomizawa, K. Suzuki, K. Manaka, and Y. Hattori, "PPAR$\alpha$ Activators Upregulate eNOS Activity and Inhibit Cytokine-Induced NF-$\kappa$ B Activation through AMP-Activated Protein Kinase Activation,"
Life Sciences, vol. 82, pp. 884-891, 2008.- [32] C.J. Savoie, S. Aburatan, S. Watanabe, Y. Eguch, S. Muta, S. Imoto, S. Miyano, S. Kuhara, and K. Tashiro, "Use of Gene Networks from Full Genome Microarray Libraries to Identify Functionally Relevant Drug-Affected Genes and Gene Regulation Cascades,"
DNA Research, vol. 10, pp. 19-25, 2003.- [33] M. Affara, B. Dunmore, C. Savoie, S. Imoto, Y. Tamada, H. Araki, D.S. Charnock-Jones, S. Miyano, and C. Print, "Understanding Endothelial Cell Apoptosis: What Can the Transcriptome, Glycome and Proteome Reveal?"
Philosophical Trans. Royal Soc. B, vol. 362, pp. 1469-1487, 2007.- [34] H. Araki, Y. Tamada, S. Imoto, B. Dunmore, D. Sanders, S. Humphrey, M. Nagasaki, A. Doi, Y. Nakanishi, K. Yasuda, Y. Tomiyasu, K. Tashiro, C. Print, D.S. Charnock-Jones, S. Kuhara, and S. Miyano, "Analysis of PPAR$\alpha$ -Dependent and PPAR$\alpha$ -Independent Transcript Regulation Following Fenofibrate Treatment of Human Endothelial Cells,"
Angiogenesis, vol. 12, no. 3, pp. 221-229, 2009.- [35] D. Heckerman, D.M. Chickering, C. Meek, R. Rounthwaite, and C. Kadie, "Dependency Networks for Inference, Collaborative Filtering, and Data Visualization,"
J. Machine Learning Research, vol. 1, pp. 49-75, 2001.- [36] O. Hirose, R. Yoshida, S. Imoto, R. Yamaguchi, T. Higuchi, D.S. Charnock-Jones, C. Print, and S. Miyano, "Statistical Inference of Transcriptional Module-Based Gene Networks from Time Course Gene Expression Profiles by Using State Space Models,"
Bioinformatics, vol. 24, no. 7, pp. 932-942, 2008.- [37] M.J.L. de Hoon, S. Imoto, K. Kobayashi, N. Ogasawara, and S. Miyano, "Inferring Gene Regulatory Networks from Time-Ordered Gene Expression Data of Bacillus Subtilis Using Differential Equations,"
Proc. Pacific Symp. Biocomputing, vol. 8, pp. 17-28, 2003.- [38] K. Basso, A.A. Margolin, G. Stolovitzky, U. Klein, R. Dalla-Favera, and A. Califano, "Reverse Engineering of Regulatory Networks in Human B Cells,"
Nature Genetics, vol. 37, pp. 382-390, 2005.- [39] T. Shimamura, R. Yamaguchi, S. Imoto, and S. Miyano, "Weighted Lasso in Graphical Gaussian Modeling for Large Gene Network Estimation Based on Microarray Data,"
Genome Informatics, vol. 19, pp. 142-153, 2007.- [40] A. Dobra, B. Jones, C. Hans, J. Nevis, and M. West, "Sparse Graphical Models for Exploring Gene Expression Data,"
J. Multivariate Analysis, vol. 90, pp. 196-212, 2004.- [41] H. Toh and K. Horimoto, "Inference of a Genetic Network by a Combined Approach of Cluster Analysis and Graphical Gaussian Modeling,"
Bioinformatics, vol. 18, no. 2, pp. 287-297, 2002.- [42] H. Zou, "Adaptive Lasso and Its Oracle Properties,"
J. Am. Statistical Assoc., vol. 101, pp. 1418-1429, 2006. |