This Article 
 Bibliographic References 
 Add to: 
Incorporating Gene Functions into Regression Analysis of DNA-Protein Binding Data and Gene Expression Data to Construct Transcriptional Networks
July-September 2008 (vol. 5 no. 3)
pp. 401-415
Useful information on transcriptional networks has been extracted by regression analyses of gene expression data and DNA-protein binding data. However, a potential limitation of these approaches is their assumption on the common and constant activity level of a transcription factor (TF) on all the genes in any given experimental condition; for example, any TF is assumed to be either an activator or a repressor, but not both, while it is known that some TFs can be dual regulators. Rather than assuming a common linear regression model for all the genes, we propose using separate regression models for various gene groups; the genes can be grouped based on their functions or some clustering results. Furthermore, to take advantage of the hierarchical structure of many existing gene function annotation systems, such as Gene Ontology (GO), we propose a shrinkage method that borrows information from relevant gene groups. Applications to a yeast dataset and simulations lend support for our proposed methods. In particular, we find that the shrinkage method consistently works well under various scenarios. We recommend the use of the shrinkage method as a useful alternative to the existing methods.

[1] F. Al-Shahrour, R. Diaz-Uriarte, and J. Dopazo, “Discovering Molecular Functions Significantly Related to Phenotypes by Combining Gene Expression Data and Biological Information,” Bioinformatics, vol. 21, pp. 2988-2993, 2005.
[2] M. Ashburner et al., “Gene Ontology: Tool for the Unification of Biology. The Gene Ontology Consortium,” Nature Genetics, vol. 25, pp. 25-29, 2000.
[3] W.T. Barry, A.B. Nobel, and F.A. Wright, “Significance Analysis of Functional Categories in Gene Expression Studies: A Structured Permutation Approach,” Bioinformatics, vol. 21, pp. 1943-1949, 2005.
[4] Y. Ben-Shaul, H. Bergman, and H. Soreq, “Identifying Subtle Interrelated Changes in Functional Gene Categories Using Continuous Measures of Gene Expression,” Bioinformatics, vol. 21, pp. 1129-1137, 2005.
[5] I. Brune, H. Werner, A.T. Huser, J. Kalinowski, A. Puhler, and A. Tauch, “The DtxR Protein Acting as Dual Transcriptional Regulator Directs a Global Regulatory Network Involved in Iron Metabolism of Corynebacterium glutamicum,” BMC Genomics, vol. 7, p. 21, 2006.
[6] H.J. Bussemaker, H. Li, and E.D. Siggia, “Regulatory Element Detection Using Correlation with Expression,” Nature Genetics, vol. 27, pp. 167-171, 2001.
[7] B.P. Carlin and T.A. Louis, Bayes and Empirical Bayes Methods for Data Analysis. Chapman and Hall/CRC Press, 2000.
[8] J. Cheng, J. Cline, J. Martin, D. Finkelstein, T. Awad, D. Kulp, and M.A. Siani-Rose, “A Knowledge-Based Clustering Algorithm Driven by Gene Ontology,” J. Biopharmaceutical Statistics, vol. 14, pp. 687-700, 2004.
[9] E.M. Conlon, X.S. Liu, J.D. Lieb, and J.S. Liu, “Integrating Regulatory Motif Discovery and Genome-Wide Expression Analysis,” Proc. Nat'l Academy of Sciences USA, vol. 100, pp.3339-3344, 2003.
[10] Y. Cui, M. Zhou, and W.H. Wong, “Integrated Analysis of Microarray Data and Gene Function Information,” OMICS, vol. 8, pp. 106-117, 2004.
[11] U. de Lichtenberg, L.J. Jensen, S. Brunak, and P. Bork, “Dynamic Complex Formation during the Yeast Cell Cycle,” Science, vol. 307, pp. 724-727, 2005.
[12] M.T. Doolin, A.L. Johnson, L.H. Johnston, and G. Butler, “Overlapping and Distinct Roles of the Duplicated Yeast Transcription Factors Ace2p and Swi5p,” Molecular Microbiology, vol. 40, pp. 22-432, 2001.
[13] B.E. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least Angle Regression (with Discussion),” Annals of Statistics, vol. 32, pp. 407-451, 2004.
[14] Z. Fang, J. Yang, Y. Li, Q. Luo, and L. Liu, “Knowledge Guided Analysis of Microarray Data,” J. Biomedical Informatics, 2005.
[15] F. Gao, B.C. Foat, and H.J. Bussemaker, “Defining Transcriptional Networks through Integrative Modeling of mRNA Expression and Transcription Factor Binding Data,” BMC Bioinformatics, vol. 5, p. 31, 2004.
[16] J. Handl, J. Knowles, and D.B. Kell, “Computational Cluster Validation in Post-Genomic Data Analysis,” Bioinformatics, vol. 21, pp. 3201-3212, 2005.
[17] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Data Mining, Inference, and Prediction. Springer, 2001.
[18] D. Huang and W. Pan, “Incorporating Biological Knowledge into Distance-Based Clustering Analysis of Microarray Gene Expression Data,” Bioinformatics, doi:10.1093/bioinformatics/btl065, 2006.
[19] T.R. Hughes, M.J. Marton, A.R. Jones, C.J. Roberts, R. Stoughton, C.D. Armour, H.A. Bennett, E. Coffey, H. Dai, Y.D. He, M.J. Kidd, A.M. King, M.R. Meyer, D. Slade, P.Y. Lum, S.B. Stepaniants, D.D. Shoemaker, D. Gachotte, K. Chakraburtty, J. Simon, M. Bard, and S.H. Friend, “Functional Discovery via a Compendium of Expression Profiles,” Cell, vol. 102, pp. 109-126, 2000.
[20] V. Iyer, C. Horak, C. Scafe, D. Botstein, M. Snyder, and P. Brown, “Genomic Binding Sites of the Yeast Cell-Cycle Transcription Factors SBF and MBF,” Nature, vol. 409, pp. 533-538, 2001.
[21] L. Kaufman and P.J. Rousseeuw, Fitting Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 1990.
[22] P. Khatri and S. Draghici, “Ontological Analysis of Gene Expression Data: Current Tools, Limitations, and Open Problems,” Bioinformatics, vol. 21, pp. 3587-3595, 2005.
[23] S. Keles, M. van der Laan, and M.B. Eisen, “Identification of Regulatory Elements Using a Feature Selection Method,” Bioinformatics, vol. 18, pp. 1167-1175, 2002.
[24] T.I. Lee et al., “Transcriptional Regulatory Networks in Saccharomyces cerevisiae,” Science, vol. 298, pp. 799-804, 2002.
[25] H.W. Mewes, D. Frishman, U. Guldener, G. Mannhaupt, K. Mayer, M. Mokrejs, B. Morgenstern, M. Munsterkoetter, S. Rudd, and B. Weil, “MIPS: A Database for Genomes and Protein Sequences,” Nucleic Acids Research, vol. 30, pp. 31-34, 2002.
[26] M. Middendorf, A. Kundaje, C. Wiggins, Y. Freund, and C. Leslie, “Predicting Genetic Regulatory Response Using Classification,” Bioinformatics, vol. 20, pp. I232-I240, 2004.
[27] R.K. Mishra, J. Mihaly, S. Barges, A. Spierer, F. Karch, K. Hagstrom, S.E. Schweinsberg, and P. Schedl, “The IAB-7 Polycomb Response Element Maps to Nucleosome-Free Region of Chromatin and Requires Both GAGA and Pleiohomeotic for Silencing Activity,” Molecular and Cellular Biology, vol. 21, pp.1311-1318, 2001.
[28] V.K. Mootha et al., “PGC-1 Alpha-Responsive Genes Involved in Oxidative Phosphorylation Are Coordinately Downregulated in Human Diabetes,” Nature Genetics, vol. 34, pp. 267-273, 2003.
[29] M. Okada and S. Hirose, “Chromatin Remodeling Mediated by Drosophila GAGA Factor and ISWI Activates Fushi Tarazu Gene Transcription In Vitro,” Molecular and Cellular Biology, vol. 18, pp.2455-2461, 1998.
[30] W. Pan, “Incorporating Biological Information as a Prior in an Empirical Bayes Approach to Analyzing Microarray Data,” Statistical Applications in Genetics and Molecular Biology, vol. 4, no. 1, 2005.
[31] W. Pan, “Incorporating Gene Functions as Priors in Model-Based Clustering of Microarray Gene Expression Data,” Bioinformatics, vol. 22, pp. 795-801, 2006.
[32] E. Perez-Rueda and J. Collado-Vides, “The Repertoire of DNA-Binding Transcriptional Regulators in Escherichia coli K-12,” Nucleic Acids Research, vol. 28, pp. 1838-1847, 2000.
[33] T. Phuong, D. Lee, and K. Lee, “Regression Trees for Regulatory Element Identification,” Bioinformatics, vol. 20, pp. 750-757, 2004.
[34] J. Ruan and W. Zhang, “CAGER: Classification Analysis of Gene Expression Regulation Using Multiple Information Sources,” BMC Bioinformatics, vol. 6, p. 114, 2005.
[35] I. Simon, J. Barnett, N. Hannett, C.T. Harbison, N.J. Rinaldi, T.L. Volkert, J.J. Wyrick, J. Zeitlinger, D.K. Gifford, T.S. Jaakkola, and R.A. Young, “Serial Regulation of Transcriptional Regulators in the Yeast Cell Cycle,” Cell, vol. 106, pp. 697-708, 2001.
[36] P.T. Spellman, G. Sherlock, M.Q. Zhang, V.R. Iyer, K. Anders, M.B. Eisen, P.O. Brown, D. Botstein, and B. Futcher, “Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization,” Molecular Biology of the Cell, vol. 9, pp. 3273-3297, 1998.
[37] N. Sun, R.J. Carroll, and H. Zhao, “Bayesian Error Analysis Model for Reconstructing Transcriptional Regulatory Networks,” Proc. Nat'l Academy of Sciences USA, vol. 103, pp. 7988-7993, 2006.
[38] L. Tian, S.A. Greenberg, S.W. Kong, J. Altschuler, I.S. Kohane, and P.J. Park, “Discovering Statistically Significant Pathways in Expression Profiling Studies,” Proc. Nat'l Academy of Sciences USA, vol. 102, pp. 13544-13549, 2005.
[39] R. Tibshirani, “Regression Shrinkage and Selection via the Lasso,” J. Royal Statistical Soc. Series B, vol. 58, pp. 267-288, 1996.
[40] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani et al., “Missing Value Estimation Methods for DNA Microarrays,” Bioinformatics, vol. 17, pp. 520-525, 2001.
[41] X.L. Xu, J.M. Olson, and L.P. Zhao, “A Regression-Based Method to Identify Differentially Expressed Genes in Microarray Time Course Studies and Its Application in an Inducible Huntington's Disease Transgenic Model,” Human Molecular Genetics, vol. 11, pp.1977-1985, 2002.
[42] B.R. Zeeberg, W. Feng, G. Wang, M.D. Wang, A.T. Fojo, M. Sunshine, S. Narasimhan, D.W. Kane, W.C. Reinhold, S. Lababidi, K.J. Bussey, J. Riss, J.C. Barrett, and J.N. Weinstein, “GoMiner: A Resource for Biological Interpretation of Genomic and Proteomic Data,” Genome Biology, vol. 4, no. R28, 2003.
[43] H. Zhao, B. Wu, and N. Sun, “DNA-Protein Binding and Gene Expression Patterns,” Science and Statistics: A Festschrift for Terry Speed, D.R. Goldstein, ed., pp. 259-274, 2003.
[44] S. Zhong, F.K. Storch, O. Lipan, M.J. Kao, C. Weitz, and W.H. Wong, “GoSurfer: A Graphical Interactive Tool for Comparative Analysis of Large Gene Sets in Gene Ontology Space,” Applied Bioinformatics, vol. 3, pp. 261-264, 2004.
[45] Y. Zhou, J.A. Young, A. Santrosyan, K. Chen, S.F. Yan, and E. Winzeler, “In Silico Gene Function Prediction Using Ontology-Based Pattern Identification,” Bioinformatics, vol. 21, pp. 1237-1245, 2005.
[46] B. Ren et al., “Genome-Wide Location and Function of DNA Binding Proteins,” Science, vol. 290, 2000.
[47] M.J. van der Laan, K.S. Pollard, and J. Bryan, “A New Partitioning around Medoids Algorithm,” J. Statistical Computation and Simulation, vol. 73, no. 8, pp. 575-584, 2003.

Index Terms:
LASSO, Microarray, Shrinkage estimator, Stratified analysis, Transcription factor
Peng Wei, Wei Pan, "Incorporating Gene Functions into Regression Analysis of DNA-Protein Binding Data and Gene Expression Data to Construct Transcriptional Networks," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 5, no. 3, pp. 401-415, July-Sept. 2008, doi:10.1109/TCBB.2007.1062
Usage of this product signifies your acceptance of the Terms of Use.