The Community for Technology Leaders
Subscribe
Issue No.02 - March/April (2011 vol.8)
pp: 464-475
Itziar Irigoien , University of the Basque Country, Donostia
Sergi Vives , University of Barcelona, Barcelona
Concepción Arenas , University of Barcelona, Barcelona
ABSTRACT
Time course studies with microarray techniques and experimental replicates are very useful in biomedical research. We present, in replicate experiments, an alternative approach to select and cluster genes according to a new measure for association between genes. First, the procedure normalizes and standardizes the expression profile of each gene, and then, identifies scaling parameters that will further minimize the distance between replicates of the same gene. Then, the procedure filters out genes with a flat profile, detects differences between replicates, and separates genes without significant differences from the rest. For this last group of genes, we define a mean profile for each gene and use it to compute the distance between two genes. Next, a hierarchical clustering procedure is proposed, a statistic is computed for each cluster to determine its compactness, and the total number of classes is determined. For the rest of the genes, those with significant differences between replicates, the procedure detects where the differences between replicates lie, and assigns each gene to the best fitting previously identified profile or defines a new profile. We illustrate this new procedure using simulated data and a representative data set arising from a microarray experiment with replication, and report interesting results.
INDEX TERMS
Cluster analysis, typical unit, gene profile, time course experiment, replicate.
CITATION
Itziar Irigoien, Sergi Vives, Concepción Arenas, "Microarray Time Course Experiments: Finding Profiles", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 2, pp. 464-475, March/April 2011, doi:10.1109/TCBB.2009.79
REFERENCES
 [1] T. Hastie, R. Tibshirani, and J.M. Friedman, The Elements of Statistical Learning. Springer, 2001. [2] T. Speed, Statistical Analysis of Gene Expression Microarray Data. Chapman and Hall, 2003. [3] L.P. Zhao, R. Prentice, and L. Breeden, "Statistical Modeling of Large Microarray Data Sets to Identify Stimulus-Response Profiles," Proc. Nat'l Academy of Sciences USA, vol. 98, pp. 5631-5636, 2001. [4] C.S. Möller-Levet, K.H. Chu, and O. Wolkenhauer, "DNA Microarray Data Clustering Based on Temporal Variation: Fcv with tsd Preclustering," Applied Bioinformatics, vol. 2, pp. 35-45, 2003. [5] S.D. Peddada, E.K. Lobenhofer, L. Li, C.A. Afshari, C.-R. Weinberg, and D.M. Umbach, "Gene Selection and Clustering for Time-Course and Dose-Response Microarray Experiments Using Order-Restricted Inference," Bioinformatics, vol. 19, pp. 834-841, 2003. [6] X. Lu, W. Zhang, Z.S. Qin, K. Kwast, and J.S. Liu, "Statistical Resynchronization, and Bayesian Detection of Periodically Expressed Genes," Nucleic Acids Research, vol. 32, pp. 447-455, 2004. [7] J. Ernst, G.J. Nau, and Z. Bar-Joseph, "Clustering Short Time Series Gene Expression Data," Bioinformatics, vol. 21, pp. i159-i168, 2005. [8] C. Fraley and A.E. Raftery, "Model-Based Clustering, Discriminant Analysis and Density Estimation," J. Am. Statistical Assoc., vol. 97, pp. 611-631, 2002. [9] M. Ramoni, P. Sebastiani, and P.R. Kohane, "Cluster Analysis of Gene Expression Dynamics," Proc. Nat'l Academy of Sciences USA, vol. 99, pp. 9121-9126, 2002. [10] Y. Luan and H. Li, "Clustering of Time-Course Gene Expression Data Using a Mixed-Effects Model with B-Spline," Bioinformatics, vol. 19, pp. 474-482, 2003. [11] Y. Luan and H. Li, "Model-Based Methods for Identifying Periodically Regulated Genes Based on the Time Course Microarray Gene Expression Data," Bioinformatics, vol. 20, pp. 332-339, 2004. [12] J.D. Storey, W. Xiao, J.T. Leek, R.G. Tompkins, and R.W. Davis, "Significance Analysis of Time Course Microarray Experiments," Proc. Nat'l Academy of Sciences USA, vol. 102, pp. 12837-12842, 2005. [13] N.A. Heard, C.C. Holmes, D.A. Stephens, D.J. Hand, and G. Dimopoulos, "Bayesian Coclustering of Anophele Gene Expression Time Series: Study of Immune Defense Response to Multiple Experimental Challenges," Proc. Nat'l Academy of Sciences USA, vol. 102, pp. 16939-16944, 2005. [14] Z. Bar-Joseph, G. Gerber, T.S. Jaakkola, D.K. Gifford, and I. Simon, "Continuous Representations of Time Series Gene Expression Data," J. Computational Biology, vol. 34, pp. 341-356, 2003. [15] P. Ma, C.I. Castillo-Davis, W. Zhong, and J.S. Liu, "A Data-Driven Clustering Method for Time Course Gene Expression Data," Nucleic Acids Research, vol. 344, pp. 1261-1269, 2006. [16] S. Chu, J. DeRisi, M. Eisen, J. Mulholland, D. Botstein, P.O. Brown, and I. Herskowitz, "The Transcriptional Program of Sporulation in Budding Yeast," Science, vol. 282, pp. 699-705, 1998. [17] L.J. Heyer, S. Kruglyak, and S. Yooseph, "Exploring Expression Data: Identification, and Analysis of Coexpressed Genes," Genome Research, vol. 9, pp. 1106-1115, 1999. [18] N. Uematsu, Y. Maki, and M. Okamoto, "Analysis of Genetic Networks Using Time-Course Data of Gene Expression Profiling," Cytometry Research, vol. 13, pp. 45-53, 2003. [19] Y. Tamada, S.Y. Kim, H. Bannai, S. Imoto, K. Tashiro, S. Kuhara, and S. Miyano, "Estimating Gene Networks from Gene Expression Data by Combining Bayesian Network Model with Promoter Element Detection," Bioinformatics, vol. 19, pp. II227-II236, 2003. [20] K. Hakamada, T. Hanai, H. Honda, and T. Kobayashi, "A Preprocessing Method for Inferring Genetic Interaction from Gene Expression Data Using Boolean Algorithm," J. Bioscience and Bioeng., vol. 98, pp. 457-463, 2004. [21] Y. Maki, Y. Takahashi, Y. Arikawa, S. Watanabe, K. Aoshima, Y. Eguchi, T. Ueda, S. Aburatani, S. Kuhara, and M. Okamoto, "An Integrated Comprehensive Workbench for Inferring Genetic Networks: Voyagene," J. Bioinformatics and Computational Biology, vol. 2, pp. 533-330, 2004. [22] G. Celeux, O. Martin, and C. Lavergne, "Mixture of Linear Mixed Models for Clustering Gene Expression Profiles from Repeated Microarray Experiments," Statistical Modelling, vol. 5, pp. 243-267, 2005. [23] S.K. Ng, G.J. McLachlan, K. Wang, L.B. Jones, and S.-W. Ng, "A Mixture Model with Random-Effects Components for Clustering Correlatgene-Expression Profiles," Bioinformatics, vol. 22, pp. 1745-1752, 2006. [24] K. Hakamada, M. Okamoto, and T. Hanai, "Novel Technique for Preprocessing High Dimensional Time-Course Data from DNA Microarray: Mathematical Model-Based Clustering," Bioinformatics, vol. 22, pp. 843-848, 2006. [25] B.R. Kim, L. Zhang, A. Berg, J. Fan, and R.L. Wu, "A Computational Approach to the Functional Clustering of Periodic Gene Expression Profiles," Genetics, vol. 180, pp. 821-834, 2008. [26] K.V. Mardia, J.T. Kent, and J.M. Bibby, Multivariate Analysis. Academic Press, 1979. [27] W.J. Krzanowski, Principles of Multivariate Analysis: A User's Perspective. Oxford Univ. Press, 1993. [28] C.J. Gower and G.B. Dijksterhuis, Procrustes Problems. Oxford Univ. Press, 2004. [29] P. Tomancak, A. Beaton, R. Weiszmann, E. Kwan, S. Shu, S.E. Lewis, S. Richards, M. Ashburner, V. Hartenstein, S.E. Celniker, and G.M. Rubin, "Systematic Determination of Patterns of Gene Expression during Drosophila Embryogenesis," Genome Biology, vol. 3, pp. 1-14, 2002. [30] R. Sibson, "Studies in the Robustness of Multidimensional Scaling: Procrustes Statistic," J. Royal Statistical Soc. B, vol. 40, pp. 234-238, 1978. [31] C. Arenas and C.M. Cuadras, "Some Recent Statistical Methods Based on Distances," Contributions to Science, vol. 2, pp. 183-191, 2002. [32] C.R. Rao, "Diversity and Dissimilarity Coefficients: A Unified Approach," Theoretical Population Biology, vol. 21, pp. 24-43, 1982. [33] Y. Benjamini and Y. Hochberg, "Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing," J. Royal Statistical Soc. B, vol. 57, pp. 289-300, 1995. [34] J.D. Storey and R. Tibshirani, "Statistical Significance for Genome-Wide Studies," Proc. Nat'l Academy of Sciences USA, vol. 100, pp. 9440-9445, 2003. [35] S. Dudoit, J.P. Shaffer, and J.C. Boldrick, "Multiple Hypothesis Testing in Microarray Experiments," Statistical Science, vol. 18, pp. 71-103, 2003. [36] I. Irigoien and C. Arenas, "INCA: New Statistic for Estimating the Number of Clusters, and Identifying Atypical Units," Statistics in Medicine, vol. 27, pp. 2948-2973, 2008. [37] J.A. Hartigan and M.A. Wong, "A K-Means Clustering Algorithm," Applied Statistics, vol. 28, pp. 126-130, 1979. [38] S.D. Peddada, S. Harris, J. Zajd, and E. Harvey, "ORIOGEN: Order Restricted Inference for Ordered Gene Expression Data," Bioinformatics, vol. 21, pp. 3933-3934, 2005. [39] G.J. McLachlan, D. Peel, K.E. Basford, and P. Adams, "The EMMIX Software for the Fitting of Mixtures of Normal and $t$ -Components," J. Statistical Software, vol. 4, pp. 1-4, 1999. [40] D.J. Lockhart et al., "Expression Monitoring by Hybridization to High-Density Oligonucleotide Arrays," Nat'l Biotechnology, vol. 14, pp. 1675-1680, 1996. [41] M. Ashburner et al., "Gene Ontology: Tool for the Unification of Biology," The Gene Ontology Consortium Nature Genetics, vol. 25, pp. 25-29, 2000. [42] C.I. Castillo-Davis and D.L. Hartl, "GeneMerge-Post-Genomic Analysis, Data Mining and Hypothesis Testing," Bioinformatics, vol. 19, pp. 891-892, 2003.
FULL ARTICLE
6 ms
(Ver 2.0)

Marketing Automation Platform