The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - May/June (2011 vol.8)
pp: 723-731
Tianwei Yu , Emory University, Atlanta
Hesen Peng , Emory University, Atlanta
Wei Sun , University of North Carolina, Chapel Hill
ABSTRACT
Microarray gene expression data often contain missing values. Accurate estimation of the missing values is important for downstream data analyses that require complete data. Nonlinear relationships between gene expression levels have not been well-utilized in missing value imputation. We propose an imputation scheme based on nonlinear dependencies between genes. By simulations based on real microarray data, we show that incorporating nonlinear relationships could improve the accuracy of missing value imputation, both in terms of normalized root-mean-squared error and in terms of the preservation of the list of significant genes in statistical testing. In addition, we studied the impact of artificial dependencies introduced by data normalization on the simulation results. Our results suggest that methods relying on global correlation structures may yield overly optimistic simulation results when the data have been subjected to row (gene)-wise mean removal.
INDEX TERMS
gene expression, statistical analysis, missing value.
CITATION
Tianwei Yu, Hesen Peng, Wei Sun, "Incorporating Nonlinear Relationships in Microarray Missing Value Imputation", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 3, pp. 723-731, May/June 2011, doi:10.1109/TCBB.2010.73
REFERENCES
[1] G.N. Brock et al., "Which Missing Value Imputation Method to Use in Expression Profiles: A Comparative Study and Two Selection Schemes," BMC Bioinformatics, vol. 9, article no. 12, 2008.
[2] O. Troyanskaya et al., "Missing Value Estimation Methods for DNA Microarrays," Bioinformatics, vol. 17, no. 6, pp. 520-525, June 2001.
[3] T.H. Bo , B. Dysvik , and I. Jonassen , "LSimpute: Accurate Estimation of Missing Values in Microarray Data with Least Squares Methods," Nucleic Acids Research, vol. 32, no. 3, p. e34, 2004.
[4] D. Yoon , E.K. Lee , and T. Park , "Robust Imputation Method for Missing Values in Microarray Data," BMC Bioinformatics, vol. 8, S6, 2007.
[5] H. Kim , G.H. Golub , and H. Park , "Missing Value Estimation for DNA Microarray Gene Expression Data: Local Least Squares Imputation," Bioinformatics, vol. 21, no. 2, pp. 187-198, Jan. 2005.
[6] D.S. Wong , F.K. Wong , and G.R. Wood , "A Multi-Stage Approach to Clustering and Imputation of Gene Expression Profiles," Bioinformatics, vol. 23, no. 8, pp. 998-1005, Apr. 2007.
[7] S. Oba et al., "A Bayesian Missing Value Estimation Method for Gene Expression Profile Data," Bioinformatics, vol. 19, no. 16, pp. 2088-2096, Nov. 2003.
[8] X. Wang et al., "Missing Value Estimation for DNA Microarray Gene Expression Data by Support Vector Regression Imputation and Orthogonal Coding Scheme," BMC Bioinformatics, vol. 7, article no. 32, 2006.
[9] R. Jornsten et al., "DNA Microarray Data Imputation and Significance Analysis of Differential Expression," Bioinformatics, vol. 21, no. 22, pp. 4155-4161, Nov. 2005.
[10] M.S. Sehgal , I. Gondal , and L.S. Dooley , "Collateral Missing Value Imputation: A New Robust Missing Value Estimation Algorithm for Microarray Data," Bioinformatics, vol. 21, no. 10, pp. 2417-2423, May 2005.
[11] M.S. Sehgal et al., "Ameliorative Missing Value Imputation for Robust Biological Knowledge Inference," J. Biomedical Informatics, vol. 41, no. 4, pp. 499-514, Aug. 2008.
[12] R. Jornsten , M. Ouyang , and H.Y. Wang , "A Meta-Data Based Method for DNA Microarray Imputation," BMC Bioinformatics, vol. 8, article no. 109, 2007.
[13] J. Tuikkala et al., "Improving Missing Value Estimation in Microarray Data with Gene Ontology," Bioinformatics, vol. 22, no. 5, pp. 566-572, Mar. 2006.
[14] X. Gan , A.W. Liew , and H. Yan , "Microarray Missing Data Imputation Based on a Set Theoretic Framework and Biological Knowledge," Nucleic Acids Research, vol. 34, no. 5, pp. 1608-1619, 2006.
[15] Q. Xiang et al., "Missing Value Imputation for Microarray Gene Expression Data Using Histone Acetylation Information," BMC Bioinformatics, vol. 9, article no. 252, 2008.
[16] K.C. Li et al., "A System for Enhancing Genome-Wide Coexpression Dynamics Study," Proc. Nat'l Academy of Sciences USA, vol. 101, no. 44, pp. 15561-15566, Nov. 2004.
[17] T. Suzuki et al., "Mutual Information Estimation Reveals Global Associations between Stimuli and Biological Processes," BMC Bioinformatics, vol. 10, S52, 2009.
[18] W. Luo , K.D. Hankenson , and P.J. Woolf , "Learning Transcriptional Regulatory Networks from High Throughput Gene Expression Data Using Continuous Three-Way Mutual Information," BMC Bioinformatics, vol. 9, article no. 467, 2008.
[19] P.E. Meyer , F. Lafitte , and G. Bontempi , "minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information," BMC Bioinformatics, vol. 9, article no. 461, 2008.
[20] X. Zhou , X. Wang , and E.R. Dougherty , "Missing-Value Estimation Using Linear and Non-Linear Regression with Bayesian Gene Selection," Bioinformatics, vol. 19, no. 17, pp. 2302-2307, Nov. 2003.
[21] A.A. Alizadeh et al., "Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling," Nature, vol. 403, no. 6769, pp. 503-511, Feb. 2000.
[22] R.E. Halbeisen and A.P. Gerber , "Stress-Dependent Coordination of Transcriptome and Translatome in Yeast," PLoS Biology, vol. 7, no. 5,e105, May 2009.
[23] D.T. Ross et al., "Systematic Variation in Gene Expression Patterns in Human Cancer Cell Lines," Nature Genetics, vol. 24, no. 3, pp. 227-235, Mar. 2000.
[24] http://0-www.ncbi.nlm.nih.gov.millennium.unicatt.it/ projects/geo/queryacc.cgi?acc=GSE19119 , 2009.
[25] P.T. Spellman et al., "Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization," Molecular Biology of the Cell, vol. 9, no. 12, pp. 3273-3297, Dec. 1998.
[26] K.C. Li , M. Yan , and S.S. Yuan , "A Simple Statistical Model for Depicting the cdc15-Synchronized Yeast Cell-Cycle Regulated Gene Expression Data," Statistica Sinica, vol. 12, no. 1, pp. 141-158, Jan. 2002.
[27] W. Stacklies et al., "pcaMethods—A Bioconductor Package Providing PCA Methods for Incomplete Data," Bioinformatics, vol. 23, no. 9, pp. 1164-1167, May 2007.
[28] J.D. Storey and R. Tibshirani , "Statistical Significance for Genomewide Studies," Proc. Nat'l Academy of Sciences USA, vol. 100, no. 16, pp. 9440-9445, Aug. 2003.
[29] S. Wichert , K. Fokianos , and K. Strimmer , "Identifying Periodically Expressed Transcripts in Microarray Time Series Data," Bioinformatics, vol. 20, no. 1, pp. 5-20, Jan. 2004.
[30] J. Demeter et al., "The Stanford Microarray Database: Implementation of New Analysis Tools and Open Source Release of Software," Nucleic Acids Research, vol. 35, pp. D766-770, Jan. 2007.
28 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool