Issue No. 03 - May/June (2011 vol. 8)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.73
Tianwei Yu , Emory University, Atlanta
Hesen Peng , Emory University, Atlanta
Wei Sun , University of North Carolina, Chapel Hill
Microarray gene expression data often contain missing values. Accurate estimation of the missing values is important for downstream data analyses that require complete data. Nonlinear relationships between gene expression levels have not been well-utilized in missing value imputation. We propose an imputation scheme based on nonlinear dependencies between genes. By simulations based on real microarray data, we show that incorporating nonlinear relationships could improve the accuracy of missing value imputation, both in terms of normalized root-mean-squared error and in terms of the preservation of the list of significant genes in statistical testing. In addition, we studied the impact of artificial dependencies introduced by data normalization on the simulation results. Our results suggest that methods relying on global correlation structures may yield overly optimistic simulation results when the data have been subjected to row (gene)-wise mean removal.
gene expression, statistical analysis, missing value.
T. Yu, W. Sun and H. Peng, "Incorporating Nonlinear Relationships in Microarray Missing Value Imputation," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. , pp. 723-731, 2010.