This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Biclustering Models for Structured Microarray Data
October-December 2005 (vol. 2 no. 4)
pp. 316-329

Abstract—Microarrays have become a standard tool for investigating gene function and more complex microarray experiments are increasingly being conducted. For example, an experiment may involve samples from several groups or may investigate changes in gene expression over time for several subjects, leading to large three-way data sets. In response to this increase in data complexity, we propose some extensions to the plaid model, a biclustering method developed for the analysis of gene expression data. This model-based method lends itself to the incorporation of any additional structure such as external grouping or repeated measures. We describe how the extended models may be fitted and illustrate their use on real data.

[1] Z. Bar-Joseph, G. Gerber, D.K. Gifford, T.S. Jaakkola, and I. Simon, “A New Approach to Analyzing Gene Expression Time Series Data,” Proc. Sixth Ann. Int'l Conf. Computational Biology (RECOMB-02), pp. 39-48, 2002.
[2] Y. Barash and N. Friedman, “Context-Specific Bayesian Clustering for Gene Expression Data,” J. Computational Biology, vol. 9, no. 2, pp. 169-191, 2002.
[3] A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini, “Discovering Local Structure in Gene Expression Data: The Order-Preserving Submatrix Problem,” Proc. Sixth Ann. Int'l Conf. Computational Biology (RECOMB-02), pp. 49-57, 2002.
[4] S. Busygin, G. Jacobsen, and E. Krämer, “Double Conjugated Clustering Applied To Leukemia Microarray Data,” Proc. Second SIAM ICDM, Workshop Clustering High Dimensional Data, 2002.
[5] A. Chaturvedi and J.D. Carroll, “An Alternating Combinatorial Optimization Approach to Fitting the INDCLUS and Generalized INDCLUS Models,” J. Classification, vol. 11, no. 2, pp. 155-170, 1994
[6] Y. Cheng and G.M. Church, “Biclustering of Expression Data,” Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology (ISMB-2000), vol. 8, pp. 93-103, 2000.
[7] G. Getz, E. Levine, and E. Domany, “Coupled Two-Way Clustering Analysis of Gene Microarray Data,” Proc. Nat'l Academy of Science USA, vol. 97, no. 22, pp. 12079-12084, 2000.
[8] T. Hastie, R. Tibshirani, M.B. Eisen, A. Alizadeh, R. Levy, L. Staudt, W.C. Chan, D. Botstein, and P. Brown, “`Gene Shaving' as a Method for Identifying Distinct Sets of Genes with Similar Expression Patterns,” Genome Biology, vol. 1, no. 2, pp. 0003.1-0003.21, 2000.
[9] N.A. Heard, C.C. Holmes, and D.A. Stephens, “A Quantitative Study of Gene Regulation Involved in the Immune Response of Anopheline Mosquitos: An Application of Bayesian Hierarchical Clustering of Curves,” J. Am. Statistical Assoc., to appear.
[10] J. Ihmels, G. Friedlander, S. Bergmann, O. Sarig, Y. Ziv, and N. Barkai, “Revealing Modular Organization in the Yeast Transcriptional Network,” Natural Genetics, vol. 31, no. 4, pp. 370-377, 2002.
[11] B. Kampmann, P.Ó. Gaora, V.A. Snewin, M.-P. Gares, D.B. Young, and M. Levin, “Evaluation of Human Antimycobacterial Immunity Using Recombinant Reporter Mycobacteria,” J. Infectious Diseases, vol. 182, no. 3, pp. 895-901, 2000.
[12] L. Lazzeroni and A. Owen, “Plaid Models for Gene Expression Data,” Statistical Sinica, vol. 12, no. 1, pp. 61-86, 2002.
[13] Y. Luan and H. Li, “Clustering of Time-Course Gene Expression Data Using a Mixed-Effects Model with B-Splines,” Bioinformatics, vol. 19, no. 4, pp. 474-482, 2003.
[14] G.J. McLachlan, R.W. Bean, and D. Peel, “A Mixture Model-Based Approach to the Clustering of Microarray Expression Data,” Bioinformatics, vol. 18, no. 3, pp. 413-422, 2002.
[15] A.B. Owen Plaid™ Software, http://www-stat.stanford.edu/~owen/clickwrap plaid.html, 2005.
[16] A.B. Owen, J. Stuart, K. Mach, A.M. Villeneuve, and S. Kim, “A Gene Recommender Algorithm to Identify Coexpressed Genes in C. Elegans,” Genome Research, vol. 13, no. 8, pp. 1828-1837, 2003.
[17] K.S. Pollard and M.J. van der Laan, “Statistical Inference for Simultaneous Clustering of Gene Expression Data,” Math. Bioscience, vol. 176, no. 1, pp. 99-121, 2002.
[18] L.-X. Qin and S.G. Self, “The Clustering of Regression Models Method with Applications in Gene Expression Data,” Technical Report 239, UW Biostatistics, Univ. Washington, http://www. bepress.com/uwbiostatpaper239 , 2005.
[19] R Development Core Team, “R: A Language and Environment for Statistical Computing,” R Foundation for Statistical Computing, Vienna, Austria, 2005.
[20] M.F. Ramoni, P. Sebastiani, and I.S. Kohane, “Cluster Analysis of Gene Expression Dynamics,” Proc. Nat'l Academy of Sciences USA, vol. 99, no. 14, pp. 9121-9126, 2002.
[21] E. Segal, N. Friedman, D. Koller, and A. Regev, “A Module Map Showing Conditional Activity of Expression Modules in Cancer,” Natural Genetics, vol. 36, no. 10, pp. 1090-1098, 2004.
[22] E. Segal, B. Taskar, A. Gasch, N. Friedman, and D. Koller, “Rich Probabilistic Models for Gene Expression,” Bioinformatics, vol. 17, no. 90001, pp. S243-S252, 2001.
[23] Q. Sheng, Y. Moreau, and B. De Moor, “Biclustering Microarray Data by Gibbs Sampling,” Bioinformatics, vol. 19, no. 2, pp. ii196-ii205, 2003.
[24] A. Tanay, R. Sharan, and R. Shamir, “Discovering Statistically Significant Biclusters in Gene Expression Data,” Bioinformatics, vol. 18, no. 90001, pp. S136-S144, 2002.
[25] C. Tang, L. Zhang, A. Zhang, and M. Ramanathan, “Interrelated Two-Way Clustering: An Unsupervised Spproach for Gene Expression Data Analysis,” Proc. Second Ann. IEEE Int'l Symp. Bioinformatics and Bioeng. (BIBE 2001), pp. 41-48, 2001.
[26] H. Turner, T. Bailey, and W. Krzanowski, “Improved Biclustering of Microarray Data Demonstrated Through Systematic Performance Tests,” Computer Statistics Data Analysis, vol. 48, no. 2, pp. 235-254, 2005.
[27] J.C. Wakefield, C. Zhou, and S.G. Self, “Modeling Gene Expression Data over Time: Curve Clustering with Informative Prior Distributions,” Bayesian Statistics 7, Proc. Seventh Valencia Int'l Meeting, pp. 721-732, 2003.
[28] Y.H. Yang, S. Dudoit, P. Luu, D.M. Lin, V. Peng, J. Ngai, and T.P. Speed, “Normalization for cDNA Microarray Data: A Robust Composite Method Addressing Single and Multiple Slide Systematic Variation,” Nucleic Acids Research, vol. 30, no. 4, p. e15, 2002.

Index Terms:
Biclustering, two-way clustering, overlapping clustering, partial supervision, repeated measures, three-way data.
Citation:
Heather L. Turner, Trevor C. Bailey, Wojtek J. Krzanowski, Cheryl A. Hemingway, "Biclustering Models for Structured Microarray Data," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 2, no. 4, pp. 316-329, Oct.-Dec. 2005, doi:10.1109/TCBB.2005.49
Usage of this product signifies your acceptance of the Terms of Use.