This Article 
 Bibliographic References 
 Add to: 
Reproducibility-Optimized Test Statistic for Ranking Genes in Microarray Studies
July-September 2008 (vol. 5 no. 3)
pp. 423-431
A principal goal of microarray studies is to identify the genes showing differential expression under distinct conditions. In such studies, the selection of an optimal test statistic is a crucial challenge, which depends on the type and amount of data under analysis. While previous studies on simulated or spike-in datasets do not provide practical guidance on how to choose the best method for a given real dataset, we introduce an enhanced reproducibility-optimization procedure, which enables the selection of a suitable gene- anking statistic directly from the data. In comparison with existing ranking methods, the reproducibilityoptimized statistic shows good performance consistently under various simulated conditions and on Affymetrix spike-in dataset. Further, the feasibility of the novel statistic is confirmed in a practical research setting using data from an in-house cDNA microarray study of asthma-related gene expression changes. These results suggest that the procedure facilitates the selection of an appropriate test statistic for a given dataset without relying on a priori assumptions, which may bias the findings and their interpretation. Moreover, the general reproducibilityoptimization procedure is not limited to detecting differential expression only but could be extended to a wide range of other applications as well.

[1] T. Aittokallio, M. Kurki, O. Nevalainen, T. Nikula, A. West, and R. Lahesmaa, “Computational Strategies for Analyzing Data in Gene Expression Microarray Experiments,” J. Bioinformatics and Computational Biology, vol. 1, no. 3, pp. 541-586, Oct. 2003.
[2] D.B. Allison, X. Cui, G.P. Page, and M. Sabripour, “Microarray Data Analysis: From Disarray to Consolidation and Consensus,” Nature Rev. Genetics, vol. 7, no. 1, pp. 55-65, Jan. 2006.
[3] P. Broberg, “Statistical Methods for Ranking Differentially Expressed Genes,” Genome Biology, vol. 4, no. 6, p. R41, May 2003.
[4] P. Broberg, “A Comparative Review of Estimates of the Proportion Unchanged Genes and the False Discovery Rate,” BMC Bioinformatics, vol. 6, p. 199, Aug. 2005.
[5] J. Comander, S. Natarajan, M.A. Gimbrone Jr., and G. Garcia-Cardena, “Improving the Statistical Detection of Regulated Genes from Microarray Data Using Intensity-Based Variance Estimation,” BMC Genomics, vol. 5, no. 1, p. 17, Feb. 2004.
[6] L.M. Cope, R.A. Irizarry, H.A. Jaffee, Z. Wu, and T.P. Speed, “A Benchmark for Affymetrix GeneChip Expression Measures,” Bioinformatics, vol. 20, no. 3, pp. 323-331, Feb. 2004.
[7] C. Genest and J.F. Plante, “On Blest's Measure of Rank Correlation,” Canadian J. Statististics, vol. 31, no. 1, pp. 35-52, 2003.
[8] R.C. Gentleman et al., “Bioconductor: Open Software Development for Computational Biology and Bioinformatics,” Genome Biology, vol. 5, no. 10, p. R80, Sept. 2004.
[9] R.A. Irizarry, B. Hobbs, F. Collin, Y.D. Beazer-Barclay, K.J. Antonellis, U. Scherf, and T.P. Speed, “Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data,” Biostatistics, vol. 4, no. 2, pp. 249-264, Apr. 2003.
[10] R.A. Irizarry, Z. Wu, and H.A. Jaffee, “Comparison of Affymetrix GeneChip Expression Measures,” Bioinformatics, vol. 22, no. 7, pp.789-794, Apr. 2006.
[11] H. Kim, G.H. Golub, and H. Park, “Missing Value Estimation for DNA Microarray Gene Expression Data: Local Least Squares Imputation,” Bioinformatics, vol. 21, no. 2, pp. 187-198, Jan. 2005.
[12] R.D. Kim and P.J. Park, “Improving Identification of Differentially Expressed Genes in Microarray Studies Using Information from Public Databases,” Genome Biology, vol. 5, no. 9, p. R70, Aug. 2004.
[13] I. Lönnstedt and T. Speed, “Replicated Microarray Data,” Statistica Sinica, vol. 12, pp. 31-46, 2002.
[14] R. Lund, “Identification of Novel Genes Involved in the Early Differentiation of Th1 and Th2 Cells,” PhD dissertation, Ann. Univ. Turkuensis D 602, 2004.
[15] T. Mehta, M. Tanik, and D.B. Allison, “Towards Sound Epistemological Foundations of Statistical Methods for High-Dimensional Biology,” Nature Genetics, vol. 36, no. 9, pp. 943-947, Sept. 2004.
[16] S. Mukherjee, S.J. Roberts, and M.J. van der Laan, “Data-Adaptive Test Statistics for Microarray Data,” Bioinformatics, vol. 21, no. 2, pp. ii108-ii114, Sept. 2005.
[17] T. Nikula, A. West, M. Katajamaa, T. Lönnberg, R. Sara, T. Aittokallio, O.S. Nevalainen, and R. Lahesmaa, “A Human ImmunoChip cDNA Microarray Provides a Comprehensive Tool to Study Immune Responses,” J. Immunological Methods, vol. 303, nos. 1-2, pp. 122-134, Aug. 2005.
[18] P. Pavlidis, Q. Li, and W.S. Noble, “The Effect of Replication on Gene Expression Microarray Experiments,” Bioinformatics, vol. 19, no. 13, pp. 1620-1627, Sept. 2003.
[19] M.S. Pepe, G. Longton, G.L. Anderson, and M. Schummer, “Selecting Differentially Expressed Genes from Microarray Experiments,” Biometrics, vol. 59, no. 1, pp. 133-142, Mar. 2003.
[20] L.X. Qin and K.F. Kerr, “Empirical Evaluation of Data Transformations and Ranking Statistics for Microarray Analysis,” Nucleic Acids Research, vol. 32, no. 18, pp. 5471-5479, Oct. 2004.
[21] G.K. Smyth, “Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments,” Statistical Applications in Genetics and Molecular Biology, vol. 3, no. 1, Feb. 2004.
[22] V.G. Tusher, R. Tibshirani, and G. Chu, “Significance Analysis of Microarrays Applied to the Ionizing Radiation Response,” Proc. Nat'l Academy of Sciences, vol. 98, no. 9, pp. 5116-5121, Apr. 2001.
[23] R. Xu and X. Li, “A Comparison of Parametric versus Permutation Methods with Applications to General and Temporal Microarray Gene Expression Data,” Bioinformatics, vol. 19, no. 10, pp. 1284-1289, July 2003.
[24] Y.H. Yang, S. Dudoit, P. Luu, D.M. Lin, V. Peng, J. Ngai, and T.P. Speed, “Normalization for cDNA Microarray Data: A Robust Composite Method Addressing Single and Multiple Slide Systematic Variation,” Nucleic Acids Research, vol. 30, no. 4, p. e15, Feb. 2002.

Index Terms:
Microarray, gene expression, gene ranking, reproducibility, differential expression, bootstrap
Laura L. Elo, Sanna Filén, Riitta Lahesmaa, Tero Aittokallio, "Reproducibility-Optimized Test Statistic for Ranking Genes in Microarray Studies," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 5, no. 3, pp. 423-431, July-Sept. 2008, doi:10.1109/tcbb.2007.1078
Usage of this product signifies your acceptance of the Terms of Use.