The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - March-April (2013 vol.10)
pp: 383-392
Cosmin Lazar , Dept. of Comput. Sci., Vrije Univ. Brussel, Brussels, Belgium
Jonatan Taminau , Dept. of Comput. Sci., Vrije Univ. Brussel, Brussels, Belgium
Stijn Meganck , Dept. of Comput. Sci., Vrije Univ. Brussel, Brussels, Belgium
David Steenhoff , Dept. of Comput. Sci., Vrije Univ. Brussel, Brussels, Belgium
Alain Coletta , IRIDIA, Univ. Libre de Bruxelles, Brussels, Belgium
David Y. Weiss Solis , IRIDIA, Univ. Libre de Bruxelles, Brussels, Belgium
Colin Molter , IRIDIA, Univ. Libre de Bruxelles, Brussels, Belgium
Robin Duque , IRIDIA, Univ. Libre de Bruxelles, Brussels, Belgium
Hugues Bersini , IRIDIA, Univ. Libre de Bruxelles, Brussels, Belgium
Ann Nowe , Dept. of Comput. Sci., Vrije Univ. Brussel, Brussels, Belgium
ABSTRACT
The potential of microarray gene expression (MAGE) data is only partially explored due to the limited number of samples in individual studies. This limitation can be surmounted by merging or integrating data sets originating from independent MAGE experiments, which are designed to study the same biological problem. However, this process is hindered by batch effects that are study-dependent and result in random data distortion; therefore numerical transformations are needed to render the integration of different data sets accurate and meaningful. Our contribution in this paper is two-fold. First we propose GENESHIFT, a new nonparametric batch effect removal method based on two key elements from statistics: empirical density estimation and the inner product as a distance measure between two probability density functions; second we introduce a new validation index of batch effect removal methods based on the observation that samples from two independent studies drawn from a same population should exhibit similar probability density functions. We evaluated and compared the GENESHIFT method with four other state-of-the-art methods for batch effect removal: Batch-mean centering, empirical Bayes or COMBAT, distance-weighted discrimination, and cross-platform normalization. Several validation indices providing complementary information about the efficiency of batch effect removal methods have been employed in our validation framework. The results show that none of the methods clearly outperforms the others. More than that, most of the methods used for comparison perform very well with respect to some validation indices while performing very poor with respect to others. GENESHIFT exhibits robust performances and its average rank is the highest among the average ranks of all methods used for comparison.
INDEX TERMS
Gene expression, Estimation, Sociology, Statistics, Data integration, Lungs,integrative analysis of gene expression microarrays, Gene expression, Estimation, Sociology, Statistics, Data integration, Lungs, nonparametric methods, Batch effects, microarray data integration, distance measures between probability density functions, inner product, density estimation
CITATION
Cosmin Lazar, Jonatan Taminau, Stijn Meganck, David Steenhoff, Alain Coletta, David Y. Weiss Solis, Colin Molter, Robin Duque, Hugues Bersini, Ann Nowe, "GENESHIFT: A Nonparametric Approach for Integrating Microarray Gene Expression Data Based on the Inner Product as a Distance Measure between the Distributions of Genes", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 2, pp. 383-392, March-April 2013, doi:10.1109/TCBB.2013.12
REFERENCES
[1] Batch Effects and Noise in Microarray Experiments: Sources and Solutions, A. Scherer, ed. John Wiley & Sons, 2009.
[2] J.T. Leek, R.B. Scharpf, H.C. Bravo, D. Simcha, B. Langmead, W.E. Johnson, D. Geman, K. Baggerly, and R.A. Irizarry, "Tackling the Widespread and Critical Impact of Batch Effects in High-Throughput Data," Nature Rev. Genetics, vol. 11, no. 10, pp. 733-739, 2010.
[3] C. Lazar, S. Meganck, J. Taminau, D. Steenhoff, A. Coletta, C. Molter, D.Y. Weiss-Solís, R. Duque, H. Bersini, and A. Nowé, "Batch Effect Removal Methods for Microarray Gene Expression Data Integration: A Survey," to be published in Briefings in Bioinformatics, 2012.
[4] J.A. Gagnon-Bartsch and T.P. Speed, "Using Control Genes to Correct for Unwanted Variation in Microarray Data," Biostatistics, vol. 13, pp. 539-552, 2011.
[5] C. Chen et al., "Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods," PLoS ONE, vol. 6, no. 2, article e17238, 2011.
[6] J. Luo et al., "A Comparison of Batch Effect Removal Methods for Enhancement of Prediction Performance Using MAQC-II Microarray Gene Expression Data," Pharmacogenomics J., vol. 10, no. 4, pp. 278-291, 2010.
[7] A. Scherer, "Variation, Variability, Batches and Bias in Microarray Experiments: An Introduction," Batch Effects and Noise in Microarray Experiments: Sources and Solutions, A. Scherer, ed., chapter 1. John Wiley & Sons, 2009.
[8] N. Altman, "Batches and Blocks, Sample Pools and Subsamples in the Design and Analysis of Gene Expression Studies," Batch Effects and Noise in Microarray Experiments: Sources and Solutions, A. Scherer, ed., chapter 4. John Wiley & Sons, 2009.
[9] M. Suarez-Farinas et al., "Harshlight: A 'Corrective Make-Up' Program for Microarray Chips," BMC Bioinformatics, vol. 6, no. 1, article 294, 2005.
[10] A. Sims et al., "The Removal of Multiplicative, Systematic Bias Allows Integration of Breast Cancer Gene Expression Data Sets—Improving Meta-Analysis and Prediction of Prognosis," BMC Medical Genomics, vol. 1, no. 1, article 42, 2008.
[11] W.E. Johnson, C. Li, and A. Rabinovic, "Adjusting Batch Effects in Microarray Expression Data Using Empirical Bayes Methods," Biostatistics, vol. 8, no. 1, pp. 118-127, 2007.
[12] M. Benito et al., "Adjustment of Systematic Microarray Data Biases," Bioinformatics, vol. 20, no. 1, pp. 105-114, 2004.
[13] A.A. Shabalin et al., "Merging Two Gene-Expression Studies via Cross-Platform Normalization," Bioinformatics, vol. 24, no. 9, pp. 1154-1160, 2008.
[14] O. Alter, P.O. Brown, and D. Botstein, "Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling," Proc. Nat'l Academy of Sciences USA, vol. 97, no. 18, pp. 10 101-10 106, 2000.
[15] J.T. Leek and J.D. Storey, "Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis," PLoS Genetics, vol. 3, no. 9, article e161, 2007.
[16] P. Warnat, R. Eils, and B. Brors, "Cross-Platform Analysis of Cancer Microarray Data Improves Gene Expression Based Classification of Phenotypes." BMC Bioinformatics, vol. 6, no. 1, article 265, 2005.
[17] M. McCall and R. Irizarry, "Thawing Frozen Robust Multi-Array Analysis (FRMA)," BMC Bioinformatics, vol. 12, no. 1, article 369, 2011.
[18] E. Parzen, "On Estimation of a Probability Density Function and Mode," Annals of Math. Statistics, vol. 33, no. 3, pp.1065-1076, 1962.
[19] S.-H. Cha, "Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions," Int'l J. Math. Models and Methods in Applied Sciences, vol. 1, no. 4, pp. 300-307, 2007,
[20] A. Subramanian, P. Tamayo, V.K. Mootha, S. Mukherjee, B.L. Ebert, M.A. Gillette, A. Paulovich, S.L. Pomeroy, T.R. Golub, E.S. Lander, and J.P. Mesirov, "Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles," Proc. Nat'l Academy Sciences USA, vol. 102, no. 43, pp. 15545-15550, 2005.
[21] R. Thomas, L.D.L Torre, X. Chang, and S. Mehrotra, "Validation and Characterization of DNA Microarray Gene Expression Data Distribution and Associated Moments," BMC Bioinformatics, vol. 11, no. 1, article 576, 2010.
[22] M.A. Newton, C.M. Kendziorski, C.S. Richmond, and F.R. Blattner, "On Differential Variability of Expression Ratios: Improving Statistical Inference About Gene Expression Changes from Microarray Data," J. Computational Biology, vol. 8, pp. 37-52, 2001.
[23] D.M. Rocke and B. Durbin, "A Model for Measurement Error for Gene Expression Arrays," J. Computational Biology, vol. 8, pp. 557-569, 2001.
[24] J. Yu, V.A. Smith, P.P. Wang, A.J. Hartemink, and E.D. Jarvis, "Advances to Bayesian Network Inference for Generating Causal Networks from Observational Biological Data," Bioinformatics, vol. 20, no. 18, pp. 3594-3603, Dec. 2004.
[25] T.-P. Lu, M.-H. Tsai, J.-M. Lee, C.-P. Hsu, P.-C. Chen, C.-W. Lin, J.-Y. Shih, P.-C. Yang, C.K. Hsiao, L.-C. Lai, and E.Y. Chuang, "Identification of a Novel Biomarker, Sema5A, for Nonsmall Cell Lung Carcinoma in Nonsmoking Women," Cancer Epidemiology Biomarkers and Prevention, vol. 19, no. 10, pp. 2590-2597, 2010.
[26] J. Hou, J. Aerts, B.. den Hamer, W. van IJcken, M. den Bakker, P. Riegman, C. van der Leest, P. van der Spek, J.A. Foekens, H.C. Hoogsteden, F. Grosveld, and S. Philipsen, "Gene Expression-Based Classification of Non-Small Cell Lung Carcinomas and Survival Prediction," PLoS ONE, vol. 5, no. 4, article e10312, 2010.
[27] M.T. Landi, T. Dracheva, M. Rotunno, J.D. Figueroa, H. Liu, A. Dasgupta, F.E. Mann, J. Fukuoka, M. Hames, A.W. Bergen, S.E. Murphy, P. Yang, A.C. Pesatori, D. Consonni, P.A. Bertazzi, S. Wacholder, J.H. Shih, N.E. Caporaso, and J. Jen, "Gene Expression Signature of Cigarette Smoking and Its Role in Lung Adenocarcinoma Development and Survival," PLoS ONE, vol. 3, no. 2, article e1651, 2008.
[28] A. Coletta, C. Molter, R. Duque, D. Steenhoff, J. Taminau, V. de Schaetzen, S. Meganck, C. Lazar, D. Venet, V. Detours, A. Nowe, H. Bersini, and D.Y.W. Solis, "InSilico DB Genomic Data Sets Hub: An Efficient Starting Point for Analyzing Genome-Wide Studies in GenePattern, Integrative Genomics Viewer, and R/Bioconductor," Genome Biology, vol. 13, no. 11, article R104, 2012.
[29] J. Taminau, D. Steenhoff, A. Coletta, S. Meganck, C. Lazar, V. de Schaetzen, R. Duque, C. Molter, H. Bersini, A. Nowé, and D.Y.W. Solís, "InSilicoDb: An R/Bioconductor Package for Accessing Human Affymetrix Expert-Curated Data Sets from GEO," Bioinformatics, vol. 27, no. 22, pp. 3204-3205, 2011.
[30] M.N. McCall, B.M. Bolstad, and R.A. Irizarry, "Frozen Robust Multiarray Analysis (FRMA)," Biostatistics, vol. 11, no. 2, pp. 242-53, 2010.
[31] G.K. Smyth, "Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments," Statistical Application Genetics Moleculer Biology, vol. 3, article 3, 2004.
[32] C. Lazar, J. Taminau, S. Meganck, D. Steenhoff, A. Coletta, C. Molter, V. de Schaetzen, R. Duque, H. Bersini, and A. Nowe, "A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1106-1119, July/Aug. 2012.
103 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool