The Community for Technology Leaders
RSS Icon
Issue No.04 - October-December (2009 vol.6)
pp: 594-604
Miquel Salicrú , Barcelona University, Spain
Sergi Vives , Barcelona University, Spain
Tian Zheng , Columbia University, New York
Cluster analysis has proven to be a useful tool for investigating the association structure among genes in a microarray data set. There is a rich literature on cluster analysis and various techniques have been developed. Such analyses heavily depend on an appropriate (dis)similarity measure. In this paper, we introduce a general clustering approach based on the confidence interval inferential methodology, which is applied to gene expression data of microarray experiments. Emphasis is placed on data with low replication (three or five replicates). The proposed method makes more efficient use of the measured data and avoids the subjective choice of a dissimilarity measure. This new methodology, when applied to real data, provides an easy-to-use bioinformatics solution for the cluster analysis of microarray experiments with replicates (see the Appendix). Even though the method is presented under the framework of microarray experiments, it is a general algorithm that can be used to identify clusters in any situation. The method's performance is evaluated using simulated and publicly available data set. Our results also clearly show that our method is not an extension of the conventional clustering method based on correlation or euclidean distance.
Clustering analysis, confidence interval, gene expression data.
Miquel Salicrú, Sergi Vives, Tian Zheng, "Inferential Clustering Approach for Microarray Experiments with Replicated Measurements", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.6, no. 4, pp. 594-604, October-December 2009, doi:10.1109/TCBB.2008.106
[1] J.P. Brody, B.A. Williams, B.J. Wold, and S.R. Quake, “Significance and Statistical Errors in the Analysis of DNA Microarray Data,” Proc. Nat'l Academy Sciences USA, vol. 99, no. 20, pp. 12975-12978, 2002.
[2] M.J. Callow, S. Dudoit, E.L. Gong, T.P. Speed, and E.M. Rubin, “Microarray Expression Profiling Identifies Genes with Altered Expression in HDL Deficient Mice,” Genome Research, vol. 10, pp.2022-2029, 2000.
[3] D. Dembele and P. Kastner, “Fuzzy C-Means Method for Clustering Microarray Data,” Bioinformatics, vol. 19, pp. 973-980, 2003.
[4] I. Dhilon, E. Marcotte, and U. Roshan, “Diametrical Clustering for Identifying Anticorrelated Gene Clusters,” Bioinformatics, vol. 19, pp. 1612-1619, 2003.
[5] S. Dudoit and J. Fridlyand, “Bagging to Improve the Accuracy of a Clustering Procedure,” Biometrics, vol. 19, pp. 1090-1099, 2003.
[6] M. Dugas, S. Merk, S. Breit, and P. Dirschedl, “Mdclust: Exploratory Microarray Analysis by Multidimensional Clustering,” Bioinformatics, vol. 20, pp. 931-936, 2004.
[7] M.B. Eisen, P. Spellman, P.O. Brown, and D. Botstein, “Cluster Analysis and Display of Genome-Wide Expression Patterns,” Proc. Nat'l Academy Sciences USA, vol. 95, pp. 14863-14868, 1998.
[8] C. Fraley and A.E. Raftery, “MCLUST: Software for Model-Based Clustering Discriminant Analysis and Density Estimation,” Technical Report 415, Dept. of Statistics, Univ. of Washington, 2002.
[9] J.A. Hartigan and M.A. Wong, “A k-Means Clustering Algorithm,” Applied Statistics, vol. 28, pp. 126-130, 1979.
[10] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2002.
[11] R. Herwig, A.J. Poustka, C. Meuller, H. Lehrach, and J. O'Brien, “Large-Scale Clustering of cDNAfingerprinting Data,” Genome Research, vol. 9, no. 11, pp. 1093-1105, 1999.
[12] D.V. Hinkley, “On the Ratio of Two Correlated Normal Random Variables,” Biometrika, vol. 56, pp. 635-639, 1969.
[13] D. Horn and I. Axel, “Novel Clustering Algorithm for Microarray Expression Data in a Truncated SVD Space,” Bioinformatics, vol. 19, pp. 1110-1115, 2003.
[14] T.R. Hughes, M.J. Marton, C.J. Jones, A.R. Roberts, R. Stoughton, C.D. Armour, H.A. Bennett, E. Coffey, and Y.D. He, “Functional Discovery via a Compendium of Expression Profiles,” Cell, vol. 102, pp. 109-126, 2000.
[15] T. Ideker, V. Thorsson, J.A. Ranish, R. Christmas, J. Buhler, J.K. Eng, R.E. Bumgarner, D.R. Goodlett, R. Aebersold, and L. Hood, “Integrated Genomic and Proteomic Analyses of a Systemically Perturbed Metabolic Network,” Science, vol. 292, pp. 929-934, 2001.
[16] N. Jardine and R. Sibson, Mathematical Taxonomy. Wiley, 1971.
[17] L. Kaufman and P.J. Rousseeuw, Finding Groups in a Data. Wiley, 1990.
[18] T. Kohonen, “The Self-Organizing Map,” Proc. IEEE, vol. 78, no. 9, pp.1464-1479, Sept. 1990.
[19] M.T. Lee, F.C. Kuo, G.A. Whitmore, and J. Sklar, “Importance of Replication in Microarray Gene Expression Studies: Statistical Methods and Evidence from Repetitive cDNA Hybridizations,” Proc. Nat'l Academy Sciences USA, vol. 97, pp. 9834-9839, 2000.
[20] A. Lukashin and R. Fuchs, “Analysis of Temporal Gene Expression Profiles: Clustering by Simulated Annealing and Determining the Optimal Number of Clusters,” Bioinformatics, vol. 17, pp.405-414, 2001.
[21] F. Luo, L. Khan, F. Bastani, I.L. Yen, and J. Zhou, Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, 2004.
[22] G.J. McLachlan, R.W. Bean, and D. Peel, “A Mixture Model-Based Approach to the Clustering of Microarray Expression Data,” Bioinformatics, vol. 18, pp. 1-10, 2002.
[23] M. Medvedovic, K.Y. Yeung, and R.E. Bumgarner, “Bayesian Mixture Model Based Clustering of Replicated Microarray Data,” Bioinformatics, vol. 8, pp. 1222-1232, 2004.
[24] J. Qin, D. Lewis, and W. Noble, “Kernel Hierarchical Gene Clustering from Microarray Gene Expression Data,” Bioinformatics, vol. 19, pp. 2097-2104, 2003.
[25] D. Ridder, F. Staal, J.M. van Dogen, and M.J. Reinders, “Maximum Significance Clustering of Oligonucleotide Microarrays,” Bioinformatics, vol. 22, pp. 326-331, 2006.
[26] M. Salicrú and P. Sánchez, “Pseudocontinuity in Hierarchical Classifications,” Information Sciences, vol. 120, pp. 257-265, 1999.
[27] M. Schena, D. Shalon, R.W. Davis, and P.O. Brown, “Quantitative Monitoring of Gene Expression Patterns with Complementary DNA Microarray,” Science, vol. 270, pp. 467-470, 1995.
[28] R. Sharan, A. Maron-Katz, and R. Shamir, “CLICK and Expander: A System for Clustering and Visualizing Gene Expression Data,” Bioinformatics, vol. 19, pp. 1787-1799, 2003.
[29] R. Sharan and R. Shamir, “CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis,” Proc. Int'l Conf. Intelligent Systems for Molecular Biology (ISMB), pp. 307-316, 2000.
[30] G. Sherlock, “Analysis of Large-Scale Gene Expression Data,” Current Opinion in Immunology, vol. 12, pp. 201-205, 2000.
[31] G.K. Smyth, “Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments,” Statistical Applications in Genetics and Molecular Biology, vol 3, no. 3, pp. 1-26, 2004.
[32] R. Steuer, J. Kurths, C. Daub, J. Weise, and J. Selbig, “The Mutual Information: Detecting and Evaluating Dependencies between Variables,” Bioinformatics, vol. 18, pp. 231-240, 2002.
[33] Z. Szallasi and R. Somogyi, “Genetic Network Analysis-the Millennium Opening Version,” Proc. Pacific Symp. BioComputing Tutorial, 2001.
[34] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E.S. Lander, and T.R. Golub, “Interpreting Patterns of Gene Expression with Self-Organizing Maps: Methods and Application to Hematopoietic Differentiation,” Proc. Nat'l Academy Sciences USA, vol. 96, pp. 2907-2912, 1999.
[35] S. Tavazoide, J. Hughes, M. Campbell, R.J. Cho, and G.M. Churo, “Systematic Determination of Genetic Network Architecture,” Nature Genetics, vol. 22, pp. 281-285, 1999.
[36] S. Theodoridis and K. Koutroumbas, Pattern Recognition. Academic Press, 1999.
[37] S. Varma and R. Simon, “Iterative Class Discovery and Feature Selection Using Minimal Spanning Trees,” BMC Bioinformatics, vol. 5, pp. 126-134, 2004.
[38] X. Wen, S. Fuhrman, G.S. Michaels, D.B. Carr, S. Smith, J.L. Barker, and R. Somogyi, “Large-Scale Temporal Gene Expression Mapping of Central Nervous System Development,” Proc. Nat'l Academy Sciences USA, vol. 95, pp. 334-339, 1998.
[39] K. Yeung, D. Haynor, and W. Ruzzo, “Validating Clustering for Gene Expression Data,” Bioinformatics, vol. 17, pp. 309-318, 2001.
[40] K.Y. Yeung, M. Medvedovic, and R.E. Bumgarner, “Clustering Gene Expression Data with Repeated Measurements,” Genome Biology, vol 4, no. 5, p. 1-16, 2003.
33 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool