This Article 
 Bibliographic References 
 Add to: 
An Information Theoretic Exploratory Method for Learning Patterns of Conditional Gene Coexpression from Microarray Data
January-March 2008 (vol. 5 no. 1)
pp. 15-24
In this article, we introduce an exploratory framework for learning patterns of conditional co-expression in gene expression data. The main idea behind the proposed approach consists of estimating how the information content shared by a set of M nodes in a network (where each node is associated to an expression profile) varies upon conditioning on a set of L conditioning variables (in the simplest case represented by a separate set of expression profiles). The method is non-parametric and it is based on the concept of statistical co-information, which, unlike conventional correlation based techniques, is not restricted in scope to linear conditional dependency patterns. Moreover, such conditional co-expression relationships can potentially indicate regulatory interactions that do not manifest themselves when only pair-wise relationships are considered. A moment based approximation of the co-information measure is derived that efficiently gets around the problem of estimating high-dimensional multi-variate probability density functions from the data, a task usually not viable due to the intrinsic sample size limitations that characterize expression level measurements. By applying the proposed exploratory method, we analyzed a whole genome microarray assay of the eukaryote Saccharomices cerevisiae and were able to learn statistically significant patterns of conditional co-expression. A selection of such interactions that carry a meaningful biological interpretation are discussed.

[1] A.J. Bell, “The Co-Information Lattice,” Proc. Fourth Int'l Symp. Independent Component Analysis and Blind Signal Separation (ICA '03), pp. 921-926, Apr. 2003.
[2] T.M. Cover and J.A. Thomas, Elements of Information Theory. John Wiley & Sons, 1991.
[3] S. Draghici, “Statistical Intelligence: Effective Analysis of High-Density Microarray Data,” Drug Discovery Today, vol. 7, no. 11, pp.S55-S63, June 2002.
[4] S. Draghici, Data Analysis Tools for DNA Microarrays. Chapman and Hall/CRC, 2003.
[5] B. Efron and R.J. Tibshirani, An Introduction to the Bootstrap. Chapman and Hall, 1993.
[6] M.B. Eisen, P.T. Spellman, P.O. Brown, and D. Botstein, “Cluster Analysis and Display of Genome-Wide Expression Patterns,” Proc. Nat'l Academy of Sciences (PNAS '98), vol. 95, pp. 14863-14868, Dec. 1998.
[7] A.P. Gash et al., “Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes,” Molecular Biology of the Cell, vol. 11, pp. 4241-4257, 2000.
[8] N. Friedman, I. Nachman, and D. Peér, “Learning Bayesian Network Structure from Massive Datasets: The ‘Sparse Candidate’ Algorithm,” Proc. 15th Conf. Uncertainty in Artificial Intelligence (UAI '99), K.B. Laskey and H. Prade, eds., pp. 206-215, 1999.
[9] T. Hastie, R. Tibshirani, and J.H. Friedman, The Elements of Statistical Learning. Springer, 2001.
[10] M.C. Jones, “The Projection Pursuit Algorithm for Exploratory Data Analysis,” PhD dissertation, Univ. of Bath, School of Math., 1983.
[11] K.C. Kao, Y.-L. Yang, R. Boscolo, C. Sabatti, V.P. Roychowdhury, and J.C. Liao, “Determination of Multiple Transcription Regulator Activities in Escherichia coli Using Network Component Analysis,” Proc. Nat'l Academy of Sciences (PNAS '04), vol. 101, no. 2, pp. 641-646, 2004.
[12] M.G. Kendall and A. Stuart, The Advanced Theory of Statistics. Volume 1: Distribution Theory, fourth ed. Griffin, 1977.
[13] T. Kohonen, “Self-Organizing Formation of Topologically Correct Feature Maps,” Biological Cybernetics, vol. 43, no. 1, pp. 59-69, 1982.
[14] K.-C. Li, “Genome-Wide Coexpression Dynamics: Theory and Application,” Proc. Nat'l Academy Sciences (PNAS '02), vol. 99, no. 26, pp. 16875-16880, Dec. 2002.
[15] J.C. Liao, R. Boscolo, Y.-L. Yang, L.M. Tran, C. Sabatti, and V.P. Roychowdhury, “Network-Enabled Reconstruction of Regulatory Signals in Biological Systems,” Proc. Nat'l Academy of Sciences (PNAS '03), vol. 100, no. 26, pp. 15522-15527, 2003.
[16] B.W. Silverman, Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1985.
[17] D.L. Wallace, “Asymptotic Approximations to Distributions,” Annals of Math. Statistics, vol. 29, pp. 635-654, 1958.

Index Terms:
Gene expression data, Statistical analysis, Information theory, Co-information, Entropy
Riccardo Boscolo, James C. Liao, Vwani P. Roychowdhury, "An Information Theoretic Exploratory Method for Learning Patterns of Conditional Gene Coexpression from Microarray Data," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 5, no. 1, pp. 15-24, Jan.-March 2008, doi:10.1109/TCBB.2007.1056
Usage of this product signifies your acceptance of the Terms of Use.