• Publication
  • 2013
  • Issue No. 2 - March-April
  • Abstract - GENESHIFT: A Nonparametric Approach for Integrating Microarray Gene Expression Data Based on the Inner Product as a Distance Measure between the Distributions of Genes
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
GENESHIFT: A Nonparametric Approach for Integrating Microarray Gene Expression Data Based on the Inner Product as a Distance Measure between the Distributions of Genes
March-April 2013 (vol. 10 no. 2)
pp. 383-392
Cosmin Lazar, Vrije Universiteit Brussel, Brussels
Jonatan Taminau, Vrije Universiteit Brussel, Brussels
Stijn Meganck, Vrije Universiteit Brussel, Brussels
David Steenhoff, Vrije Universiteit Brussel, Brussels
Alain Coletta, Université Libre de Bruxelles, Brussels
David Y. Weiss Solis, Université Libre de Bruxelles, Brussels
Colin Molter, Université Libre de Bruxelles, Brussels
Robin Duque, Université Libre de Bruxelles, Brussels
Hugues Bersini, Université Libre de Bruxelles, Brussels
Ann Nowe, Vrije Universiteit Brussel, Brussels
The potential of microarray gene expression (MAGE) data is only partially explored due to the limited number of samples in individual studies. This limitation can be surmounted by merging or integrating data sets originating from independent MAGE experiments, which are designed to study the same biological problem. However, this process is hindered by batch effects that are study-dependent and result in random data distortion; therefore numerical transformations are needed to render the integration of different data sets accurate and meaningful. Our contribution in this paper is two-fold. First we propose GENESHIFT, a new nonparametric batch effect removal method based on two key elements from statistics: empirical density estimation and the inner product as a distance measure between two probability density functions; second we introduce a new validation index of batch effect removal methods based on the observation that samples from two independent studies drawn from a same population should exhibit similar probability density functions. We evaluated and compared the GENESHIFT method with four other state-of-the-art methods for batch effect removal: Batch-mean centering, empirical Bayes or COMBAT, distance-weighted discrimination, and cross-platform normalization. Several validation indices providing complementary information about the efficiency of batch effect removal methods have been employed in our validation framework. The results show that none of the methods clearly outperforms the others. More than that, most of the methods used for comparison perform very well with respect to some validation indices while performing very poor with respect to others. GENESHIFT exhibits robust performances and its average rank is the highest among the average ranks of all methods used for comparison.
Index Terms:
Gene expression,Estimation,Sociology,Statistics,Data integration,Lungs,integrative analysis of gene expression microarrays,Gene expression,Estimation,Sociology,Statistics,Data integration,Lungs,nonparametric methods,Batch effects,microarray data integration,distance measures between probability density functions,inner product,density estimation
Citation:
Cosmin Lazar, Jonatan Taminau, Stijn Meganck, David Steenhoff, Alain Coletta, David Y. Weiss Solis, Colin Molter, Robin Duque, Hugues Bersini, Ann Nowe, "GENESHIFT: A Nonparametric Approach for Integrating Microarray Gene Expression Data Based on the Inner Product as a Distance Measure between the Distributions of Genes," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 2, pp. 383-392, March-April 2013, doi:10.1109/TCBB.2013.12
Usage of this product signifies your acceptance of the Terms of Use.