The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - October-December (2009 vol.6)
pp: 615-628
ABSTRACT
Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data, little attention has been paid to uncertainty in the results obtained. Dirichlet process mixture (DPM) models provide a nonparametric Bayesian alternative to the bootstrap approach to modeling uncertainty in gene expression clustering. Most previously published applications of Bayesian model-based clustering methods have been to short time series data. In this paper, we present a case study of the application of nonparametric Bayesian clustering methods to the clustering of high-dimensional nontime series gene expression data using full Gaussian covariances. We use the probability that two genes belong to the same cluster in a DPM model as a measure of the similarity of these gene expression profiles. Conversely, this probability can be used to define a dissimilarity measure, which, for the purposes of visualization, can be input to one of the standard linkage algorithms used for hierarchical clustering. Biologically plausible results are obtained from the Rosetta compendium of expression profiles which extend previously published cluster analyses of this data.
INDEX TERMS
Clustering, classification, and association rules, biology and genetics, bioinformatics (genome or protein) databases, statistical computing, stochastic processes, Monte Carlo.
CITATION
Carl Edward Rasmussen, Bernard J. de la Cruz, Zoubin Ghahramani, David L. Wild, "Modeling and Visualizing Uncertainty in Gene Expression Clusters Using Dirichlet Process Mixtures", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.6, no. 4, pp. 615-628, October-December 2009, doi:10.1109/TCBB.2007.70269
REFERENCES
[1] M. Eisen, P. Spellman, P. Brown, and D. Botstein, “Cluster Analysis and Display of Genome-Wide Expression,” Proc. Nat'l Academy of Sciences USA, vol. 95, pp. 14863-14868, 1998.
[2] U. Alon, N. Barkai, D. Notterman, K. Gish, S. Ybarra, D. Mack, and A. Levine, “Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” Proc. Nat'l Academy of Sciences USA, vol. 96, pp. 6745-6750, 1999.
[3] G. McLachlan, R. Bean, and D. Peel, “A Mixture Model-Based Approach to the Clustering of Microarray Expression Data,” Bioinformatics, vol. 18, no. 3, pp. 413-422, 2002.
[4] T. Hughes, M. Marton, A. Jones, C. Roberts, R. Stoughton, C. Armour, H. Bennett, E. Coffey, H. Dai, Y. He, M. Kidd, A. King, M. Meyer, D. Slade, P. Lum, S. Stepaniants, D. Shoemaker, D. Gachotte, K. Chakraburtty, J. Simon, M. Bard, and S. Friend, “Functional Discovery via a Compendium of Expression Profiles,” Cell, vol. 102, pp. 109-126, July 2000.
[5] R.M. Neal, “Markov Chain Sampling Methods for Dirichlet Process Mixture Models,” J. Computational and Graphical Statistics, vol. 9, pp. 249-265, 2000.
[6] C.E. Rasmussen, “The Infinite Gaussian Mixture Model,” Advances in Neural Information Processing Systems 12, S.A. Solla, T.K. Leen, and K.-R. Müller, eds., pp. 554-560, MIT Press, 2000.
[7] C. Antoniak, “Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems,” Annals of Statistics, vol. 2, pp.1152-1174, 1974.
[8] T. Ferguson, “A Bayesian Analysis of Some Nonparametric Problems,” Annals of Statistics, vol. 1, pp. 209-230, 1973.
[9] A.Y. Lo, “On a Class of Bayesian Nonparametric Estimates: I.Density Estimates,” Annals of Statistics, vol. 12, pp. 351-357, 1984.
[10] M.D. Escobar and M. West, “Bayesian Density Estimation and Inference Using Mixtures,” J. Am. Statistical Assoc., vol. 90, no. 430, pp. 577-588, 1995.
[11] D.L. Wild, C.E. Rasmussen, Z. Ghahramani, J. Cregg, B.J. de la Cruz, C.-C. Kan, and K.A. Scanlon, “A Bayesian Approach to Modelling Uncertainty in Gene Expression Clusters,” Proc. Third Int'l Conf. Systems Biology (ICSB), 2002.
[12] M. Medvedovic and S. Sivaganesan, “Bayesian Infinite Mixture Model Based Clustering of Gene Expression Profiles,” Bioinformatics, vol. 18, no. 9, pp. 1194-1206, 2002.
[13] M. Medvedovic, K.Y. Yeung, and R.E. Bumgarner, “Bayesian Mixture Model Based Clustering of Replicated Microarray Data,” Bioinformatics, vol. 20, no. 8, pp. 1222-1232, 2004.
[14] X. Liu, S. Sivaganesan, K.Y. Yeung, J. Guo, R.E. Bumgarner, and M. Medvedovic, “Context-Specific Infinite Mixtures for Clustering Gene Expression Profiles across Diverse Microarray Dataset,” Bioinformatics, vol. 22, no. 14, pp. 1737-1744, 2006.
[15] D. Dahl, “Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model,” Bayesian Inference for Gene Expression and Proteomics, K.-A. Do, P. Müller, and M.Vannucci,eds., Cambridge Univ. Press, 2006.
[16] Z.S. Qin, “Clustering Microarray Gene Expression Data Using Weighted Chinese Restaurant Process,” Bioinformatics, vol. 22, no. 16, pp. 1988-1997, 2006.
[17] A. Dubey, S. Hwang, C. Rangel, C. Rasmussen, Z. Ghahramani, and D.L. Wild, “Clustering Protein Sequence and Structure Space with Infinite Gaussian Mixture Models,” Proc. Pacific Symp. Biocomputing (PSB '04), R.B. Altman, A.K. Dunker, L. Hunter, and T.E. Klein, eds., pp. 399-410, 2004.
[18] J. Hartigan, Clustering Algorithms. Wiley, 1975.
[19] K. Yeung, D. Haynor, and W. Ruzzo, “Validating Clustering for Gene Expression Data,” Bioinformatics, vol. 17, pp. 309-318, 2001.
[20] D.J. Mackay, Information Theory, Inference and Learning Algorithms. Cambridge Univ. Press, 2003.
[21] G. McLachlan and D. Peel, Finite Mixture Models. Wiley, 2000.
[22] K. Yeung, C. Fraley, A. Murua, A. Raftery, and W. Ruzzo, “Model Based Clustering and Data Transformations for Gene Expression Data,” Bioinformatics, vol. 17, pp. 977-987, 2001.
[23] D. Görür, “Nonparametric Bayesian Discrete Latent Variable Models for Unsupervised Learning,” PhD dissertation, Max Planck Inst. for Biological Cybernetics, 2007.
[24] E. Boyle, S. Weng, J. Gollub, H. Jin, D. Botstein, J. Cherry, and G. Sherlock, “Go::Termfinder-Open Source Software for Accessing Gene Ontology Information and Finding Significantly Enriched Gene Ontology Terms Associated with a List of Genes,” Bioinformatics, vol. 20, no. 18, pp. 3710-3715, 2004.
[25] M. Viswanathan, G. Muthukumar, Y.S. Cong, and J. Lenard, “Seripauperins of Saccharomyces Cerevisiae: A New Multigene Family Encoding Serine-Poor Relatives of Serine-Rich Proteins,” Gene, vol. 148, no. 1, pp. 149-153, 1994.
[26] N. Rachidi, M.J. Martinez, P. Barre, and B. Blondin, “Saccharomyces Cerevisiae PAU Genes Are Induced by Anaerobiosis,” Molecular Microbiology, vol. 35, no. 6, pp. 1421-1430, 2000.
[27] F. Klis, A. Boorsma, and P.D. Groot, “Cell Wall Construction in Saccharomyces Cerevisiae,” Yeast, vol. 23, no. 185-202, 2006.
[28] U. Jung and D. Levin, “Genome-Wide Analysis of Gene Expression Regulated by the Yeast Cell Wall Integrity Signalling Pathway,” Molecular Microbiology, vol. 34, pp. 1049-1057, 1999.
[29] W. McDowell and R. Schwarz, “Dissecting Glycoprotein Biosynthesis by Use of Specific Inhibitors,” Biochimie, vol. 70, pp. 1535-1549, 1998.
[30] A. Enyenihi and W. Saunders, “Large-Scale Functional Genomic Analysis of Sporulation and Meiosis in Saccharomyces Cerevisiae,” Genetics, vol. 163, no. 1, pp. 47-54, 2003.
[31] M. Schuldiner et al., “Exploration of the Function and Organization of the Yeast Early Secretory Pathway through an Epistatic Miniarray Profile,” Cell, vol. 123, no. 3, pp. 507-519, 2005.
[32] A. Boorsma, H. de Nobel, B. ter Riet, B. Bargmann, S. Brul, K. Hellingwerf, and F. Klis, “Characterization of the Transcriptional Response to Cell Wall Stress in Saccharomyces Cerevisiae,” Yeast, vol. 21, pp. 413-427, 2004.
[33] M. Kaeberlein, M. McVey, and L. Guarente, “The sir2/3/4 Complex and sir2 Alone Promote Longevity in Saccharomyces Cerevisiae by Two Different Mechanisms,” Genes and Development, vol. 13, pp. 2570-2580, 1999.
[34] G. Blander and L. Guarente, “The sir2 Family of Protein Deacetylases,” Ann. Rev. Biochemistry, vol. 73, pp. 417-435, 2004.
[35] J. Masson and D. Ramotar, “The Saccharomyces Cerevisiae imp2 Gene Encodes a Transcriptional Activator that Mediates Protection against DNA Damage Caused by Bleomycin and Other Oxidants,” Molecular and Cellular Biology, vol. 16, no. 5, pp. 2091-2100, 1996.
[36] C. Donnini et al., “Imp2, a Nuclear Gene Controlling the Mitochondrial Dependence of Galactose, Maltose and Raffinose Utilization in Saccharomyces Cerevisiae,” Yeast, vol. 8, no. 2, pp.83-93, 1992.
[37] J. Mellor and A. Morillon, “Iswi Complexes in Saccharomyces Cerevisiae,” Biochimica et Biophysica Acta, vol. 1677, nos. 1-3, pp.100-112, 2004.
[38] T. Kataoka et al., “Genetic Analysis of Yeast ras1 and ras2 Genes,” Cell, vol. 37, no. 2, pp. 437-445, 1984.
[39] R.L. Smith and A.D. Johnson, “Turning Genes Off by ssn6-tup1: A Conserved System of Transcriptional Repression in Eukaryotes,” Trends in Biochemical Sciences, vol. 25, no. 325-330, 2000.
[40] M.K. Kerr and G.A. Churchill, “Bootstrapping Cluster Analysis: Assessing the Reliability of Conclusions from Microarray Experiments,” Proc. Nat'l Academy of Sciences USA, vol. 98, no. 16, pp.8961-8965, 2001.
[41] K. Zhang and H. Zhao, “Assessing Reliability of Gene Clusters from Gene Expression Data,” Functional & Integrative Genomics, vol. 1, pp. 156-173, 2000.
[42] N.A. Heard, C.C. Holmes, and D.A. Stephens, “A Quantitative Study of Gene Regulation Involved in the Immune Response of Anopheline Mosquitoes: An Application of Bayesian Hierarchical Clustering of Curves,” J. Am. Statistical Assoc., vol. 101, no. 473, pp.18-29, 2006.
[43] K.A. Heller and Z. Ghahramani, “Bayesian Hierarchical Clustering,” Proc. 22nd Int'l Conf. Machine Learning (ICML), 2005.
[44] J.W. Lau and P.J. Green, “Bayesian Model Based Clustering Procedures,” J. Computational and Graphical Statistics, vol. 16, no. 3, pp. 526-558, 2007.
[45] G. Bidaut, K. Suhre, J.-M. Claverie, and M. Ochs, “Determination of Strongly Overlapping Signaling Activity from Microarray Data,” BMC Bioinformatics, vol. 7, pp. 99-111, 2006.
[46] H. Mewes et al., “Mips: Analysis and Annotation of Proteins from Whole Genomes,” Nucleic Acids Research, vol. 32, pp. D41-D44, 2004.
26 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool