Subscribe
Issue No.05 - Sept.-Oct. (2012 vol.9)
pp: 1281-1292
Lin Wan , Mol. & Comput. Biol. Program, Univ. of Southern California, Los Angeles, CA, USA
Fengzhu Sun , Mol. & Comput. Biol. Program, Univ. of Southern California, Los Angeles, CA, USA
ABSTRACT
RNA-Seq is widely used in transcriptome studies, and the detection of differentially expressed genes (DEGs) between two classes of individuals, e.g., cases versus controls, using RNA-Seq is of fundamental importance. Many statistical methods for DEG detection based on RNA-Seq data have been developed and most of them are based on the read counts mapped to individual genes. On the other hand, genes are composed of exons and the distribution of reads for the different exons can be heterogeneous. We hypothesize that the detection accuracy of differentially expressed genes can be increased by analyzing individual exons within a gene and then combining the results of the exons. We therefore developed a novel program, termed CEDER, to accurately detect DEGs by combining the significance of the exons. CEDER first tests for differentially expressed exons yielding a p-value for each, and then gives a score indicating the potential for a gene to be differentially expressed by integrating the p-values of the exons in the gene. We showed that CEDER can significantly increase the accuracy of existing methods for detecting DEGs on two benchmark RNA-Seq data sets and simulated datasets.
INDEX TERMS
RNA, genetics, molecular biophysics, DEG detection, CEDER, differentially expressed gene, exons, RNA-Seq data, transcriptome, simulated datasets, Accuracy, Bioinformatics, Standards, Statistical analysis, Image edge detection, Genomics, combined p-value statistic., RNA-Seq, gene expression, differentially expressed gene, high-throughput sequencing
CITATION
Lin Wan, Fengzhu Sun, "CEDER: Accurate Detection of Differentially Expressed Genes by Combining Significance of Exons Using RNA-Seq", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no. 5, pp. 1281-1292, Sept.-Oct. 2012, doi:10.1109/TCBB.2012.83
REFERENCES
 [1] T. Speed, Statistical Analysis of Gene Expression Microarray Data, Chapman and Hall/CRC, 2003. [2] Z. Wang, M. Gerstein, and M. Snyder, "RNA-Seq: A Revolutionary Tool for Transcriptomics," Nature Rev. Genetics, vol. 10, no. 1, pp. 57-63, 2009. [3] B.J. Haas and M.C. Zody, "Advancing RNA-Seq Analysis," Nature Biotechnology, vol. 28, no. 5, pp. 421-423, 2010. [4] J.C. Marioni, C.E. Mason, S.M. Mane, M. Stephens, and Y. Gilad, "RNA-Seq: An Assessment of Technical Reproducibility and Comparison with Gene Expression Arrays," Genome Research, vol. 18, no. 9, pp. 1509-1517, 2008. [5] A. Mortazavi, B.A. Williams, K. McCue, L. Schaeffer, and B. Wold, "Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq," Nature Methods, vol. 5, no. 7, pp. 621-628, 2008. [6] E.T. Wang, R. Sandberg, S. Luo, I. Khrebtukova, L. Zhang, C. Mayr, S.F. Kingsmore, G.P. Schroth, and C.B. Burge, "Alternative Isoform Regulation in Human Tissue Transcriptomes," Nature, vol. 456, no. 7221, pp. 470-476, 2008. [7] R.D. Hawkins, G.C. Hon, and B. Ren, "Next-Generation Genomics: An Integrative Approach," Nature Rev. Genetics, vol. 11, no. 7, pp. 476-486, 2010. [8] L. Pachter, "Models for Transcript Quantification from RNA-Seq," Arxiv, http://arxiv.org/abs1104.3889, 2011. [9] A. Oshlack, M.D. Robinson, and M.D. Young, "From RNA-Seq Reads to Differential Expression Results," Genome Biology, vol. 11, no. 12, p. 220, 2010. [10] S. Pepke, B. Wold, and A. Mortazavi, "Computation for ChIP-Seq and RNA-Seq Studies," Nature Methods, vol. 6, no. 11 Suppl, pp. S22-S32, 2009. [11] L.V. Hedges and I. Olkin, Statistical Methods for Meta-Analysis. Academic Press, 1985. [12] J. Chapman and J. Whittaker, "Analysis of Multiple SNPs in a Candidate Gene or Region," Genetic Epidemiology, vol. 32, pp. 560-566, 2008. [13] A. Hess and H. Iyer, "Fisher's Combined $p$ -Value for Detecting Differentially Expressed Genes Using Affymetrix Expression Arrays," BMC Genomics, vol. 8, article 96, 2007. [14] K.J. Kechris, B. Biehs, and T.B. Kornberg, "Generalizing Moving Averages for Tiling Arrays Using Combined $p$ -Value Statistics," Statistical Applications Genetics and Moleculer Biology, vol. 9, no. 1, Article 29, 2010. [15] S. Anders and W. Huber, "Differential Expression Analysis for Sequence Count Data," Genome Biology, vol. 11, no. 10, p. R106, 2010. [16] R.A. Fisher, Statistical Methods for Research Workers. Oliver & Boyd, 1970. [17] T. Liptak, "On the Combination of Independent Tests," Magyar Tudomanyos Akademia Matematikai Kutato In-tezetenek Kozlemenvei, vol. 3, pp. 171-197, 1958. [18] L.H.C. Tippett, The Methods of Statistics. Williams and Norgate, 1931. [19] J.H. Bullard, E. Purdom, K.D. Hansen, and S. Dudoit, "Evaluation of Statistical Methods for Normalization and Differential Expression in mRNA-Seq Experiments," BMC Bioinformatics, vol. 11, article 94, 2010. [20] H. Jiang and W.H. Wong, "Statistical Inferences for Isoform Expression in RNA-Seq," Bioinformatics, vol. 25, no. 8, pp. 1026-1032, 2009. [21] L. Wang, Z. Feng, X. Wang, X. Wang, and X. Zhang, "DEGseq: An R Package for Identifying Differentially Expressed Genes from RNA-Seq Data," Bioinformatics, vol. 26, no. 1, pp. 136-138, 2010. [22] M.D. Robinson and G.K. Smyth, "Small-Sample Estimation of Negative Binomial Dispersion, with Applications to SAGE Data," Biostatistics, vol. 9, no. 2, pp. 321-332, 2008. [23] M.D. Robinson, D.J. McCarthy, and G.K. Smyth, "edgeR: A Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data," Bioinformatics, vol. 26, no. 1, pp. 139-140, 2010. [24] S. Srivastava and L. Chen, "A Two-Parameter Generalized Poisson Model to Improve the Analysis of RNA-Seq Data," Nucleic Acids Research, vol. 38, no. 17, p. e170, 2010. [25] P.L. Auer and R.W. Doerge, "Statistical Design and Analysis of RNA Sequencing Data," Genetics, vol. 185, no. 2, pp. 405-416, 2010. [26] Z. Wu, B.D. Jenkins, T.A. Rynearson, S.T. Dyhrman, M.A. Saito, M. Mercier, and L.P. Whitney, "Empirical Bayes Analysis of Sequencing-Based Transcriptional Profiling without Replicates," BMC Bioinformatics, vol. 11, article 564, 2010. [27] "Combination of Independent Tests," P.R. Krishnaiah and P.K. Sen, eds., Handbook of Statistics 4: Nonparametric Methods, pp. 113-121, Elsevier, 1984. [28] Z. Wu, X. Wang, and X. Zhang, "Using Non-Uniform Read Distribution Models to Improve Isoform Expression Inference in RNA-Seq," Bioinformatics, vol. 27, no. 4, pp. 502-508, 2011. [29] L. Shi et al., "The Microarray Quality Control (MAQC) Project Shows Inter-and Intraplatform Reproducibility of Gene Expression Measurements," Nature Biotechnology, vol. 24, no. 9, pp. 1151-1161, 2006. [30] S. Nacu, W. Yuan, Z. Kan, D. Bhatt, C.S. Rivers, J. Stinson, B.A. Peters, Z. Modrusan, K. Jung, S. Seshagiri, and T.D. Wu, "Deep RNA Sequencing Analysis of Readthrough Gene Fusions in Human Prostate Adenocarcinoma and Reference Samples," BMC Medical Genomics, vol. 4, article 11, 2011. [31] B. Langmead, C. Trapnell, M. Pop, and S.L. Salzberg, "Ultrafast and Memory-Efficient Alignment of Short DNA Sequences to the Human Genome," Genome Biology, vol. 10, no. 3, p. R25, 2009. [32] R.D. Canales, Y. Luo, J.C. Willey, B. Austermiller, C.C. Barbacioru, C. Boysen, K. Hunkapiller, R.V. Jensen, C.R. Knight, K.Y. Lee, Y. Ma, B. Maqsodi, A. Papallo, E.H. Peters, K. Poulter, P.L. Ruppel, R.R. Samaha, L. Shi, W. Yang, L. Zhang, and F.M. Goodsaid, "Evaluation of DNA Microarray Results with Quantitative Gene Expression Platforms," Nature Biotechnology, vol. 24, no. 9, pp. 1115-1122, 2006.