The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - Jan.-Feb. (2013 vol.10)
pp: 98-108
Ye Yang , Bank of Nova Scotia (Scotiabank), Toronto, ON, Canada
Farnoosh Abbas Aghababazadeh , Dept. of Math. & Stat., Univ. of Ottawa, Ottawa, ON, Canada
David R. Bickel , Dept. of Biochem., Univ. of Ottawa, Ottawa, ON, Canada
ABSTRACT
Many genome-wide association studies have been conducted to identify single nucleotide polymorphisms (SNPs) that are associated with particular diseases or other traits. The local false discovery rate (LFDR) estimated using semiparametric models has enjoyed success in simultaneous inference. However, semiparametric LFDR estimators can be biased because they tend to overestimate the proportion of the nonassociated SNPs. We address the problem by adapting a simple parametric mixture model (PMM) and by comparing this model to the semiparametric mixture model (SMM) behind an LFDR estimator that is known to be conservatively biased. Then, we also compare the PMM with a parametric nonmixture model (PNM). In our simulation studies, we thoroughly analyze the performances of the three models under different values of p1, a prior probability that is approximately equal to the proportion of SNPs that are associated with the disease. When p1 > 10%, the PMM generally performs better than the SMM. When p1 <; 0:1%, the SMM outperforms PMM. When p1 lies between 0.1 and 10 percent, both methods have about the same performance. In that setting, the PMM may be preferred since it has the advantage of supplying an estimate of the detectability level of the nonassociated SNPs.
INDEX TERMS
Diseases, Solid modeling, Bioinformatics, Estimation, Adaptation models, Analytical models, Standards,strength of statistical evidence, Empirical Bayes, genome-wide association studies, local false discovery rate, minimum description length, MDL, reduced likelihood, Type II maximum likelihood
CITATION
Ye Yang, Farnoosh Abbas Aghababazadeh, David R. Bickel, "Parametric Estimation of the Local False Discovery Rate for Identifying Genetic Associations", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.10, no. 1, pp. 98-108, Jan.-Feb. 2013, doi:10.1109/TCBB.2012.140
REFERENCES
[1] J.N. Hirschhorn and M.J. Daly, “Genome-Wide Association Studies for Common Diseases and Complex Traits,” Nature Rev. Genetics, vol. 6, pp. 95-108, 2005.
[2] Int'l HapMap Consortium, “A Haplotype Map of the Human Genome,” Nature, vol. 437, pp. 1299-1320, 2005.
[3] Wellcome Trust Case Control Consortium, “Genome-Wide Association Study of 14,000 Cases of Seven Common Diseases and 3,000 Shared Controls,” Nature, vol. 447, pp. 661-678, 2007.
[4] H.H.H. Göing, J.D. Terwilliger, and J. Blangero, “Large Upward Bias in Estimation of Locus-Specific Effects from Genomewide Scans,” Am. J. Human Genetics, vol. 69, pp. 1357-1369, 2001.
[5] R. McPherson, A. Pertsemlidis, N. Kavaslar, A. Stewart, R. Roberts, D.R. Cox, D.A. Hinds, L.A. Pennacchio, A. Tybjaerg-Hansen, A.R. Folsom, E. Boerwinkle, H.H. Hobbs, and J.C. Cohen, “A Common Allele on Chromosome 9 Associated with Coronary Heart Disease,” Science, vol. 316, pp. 1488-1491, 2007.
[6] G. Montana, “Statistical Methods in Genetics,” Briefings in Bioinformatics, vol. 7, pp. 297-308, 2006.
[7] Y. He, W. Pan, and J. Lin, “Cluster Analysis Using Multivariate Normal Mixture Models to Detect Differential Gene Expression with Microarray Data,” Computational Statistics and Data Analysis, vol. 51, pp. 641-658, 2006.
[8] C. Sabatti, S. Service, and N. Freimer, “False Discovery Rate in Linkage and Association Genome Screens for Complex Disorders,” Genetics, vol. 164, pp. 829-833, 2003.
[9] Int'l HapMap Consortium, “A Haplotype Map of the Human Genome,” Nature, vol. 437, pp. 1299-1320, 2005.
[10] B. Efron, R. Tibshirani, J.D. Storey, and V. Tusher, “Empirical Bayes Analysis of a Microarray Experiment,” J. Am. Statistical Assoc., vol. 96, pp. 1151-1160, 2001.
[11] J.P. Ioannidis, R. Tarone, and J.K. McLaughlin, “The False-Positive to False-Negative Ratio in Epidemiologic Studies,” Epidemiology (Cambridge, Mass.), vol. 22, pp. 450-456, 2011.
[12] A. Ziegler, K.I.R., and J. Thompson, “Biostatistical Aspects of Genome-Wide Association Studies,” Biometrical J., vol. 50, pp. 8-28, 2008.
[13] J.D. Storey, “A Direct Approach to False Discovery Rates,” J. Royal Statistical Soc. B, vol. 64, pp. 479-498, 2002.
[14] J. Bukszár, J.L. McClay, and E.J.C.G. van den Oord, “Estimating the Posterior Probability That Genome-Wide Association Findings Are True or False,” Bioinformatics, vol. 25, pp. 1807-1813, 2009.
[15] C.M.T. Greenwood, J. Rangrej, and L. Sun, “Optimal Selection of Markers for Validation or Replication from Genome-Wide Association Studies,” Genetic Epidemiology, vol. 31, no. 5, pp. 396-407, 2007.
[16] D.R. Bickel, “Estimating the Null Distribution to Adjust Observed Confidence Levels for Genome-Scale Screening,” Biometrics, vol. 67, pp. 363-370, 2011.
[17] J. Wakefield, “A Bayesian Measure of the Probability of False Discovery in Genetic Epidemiology Studies,” Am. J. Human Genetics, vol. 81, pp. 208-227, 2007.
[18] Y. Wei, S. Wen, P. Chen, C. Wang, and C.K. Hsiao, “A Simple Bayesian Mixture Model with a Hybrid Procedure for Genome-Wide Association Studies,” European J. Human Genetics, pp. 942-947, 2010.
[19] D.B. Allison, G.L. Gadbury, M. Heo, J.R. Fernádez, C. Lee, T.A. Prolla, and R. Weindruch, “A Mixture Model Approach for the Analysis of Microarray Gene Expression Data,” Computational Statistics and Data Analysis, vol. 38, pp. 1-20, 2002.
[20] S. Pounds and S.W. Morris, “Estimating the Occurrence of False Positives and False Negatives in Microarray Studies by Approximating and Partitioning the Empirical Distribution of P-Values,” Bioinformatics, vol. 19, pp. 1236-1242, 2003.
[21] W. Pan, J. Lin, and C.T. Le, “A Mixture Model Approach to Detecting Differentially Expressed Genes with Microarray Data,” Functional and Integrative Genomics, vol. 3, pp. 117-124, 2003.
[22] J.G. Liao, Y. Lin, Z.E. Selvanayagam, and W.J. Shih, “A Mixture Model for Estimating the Local False Discovery Rate in DNA Microarray Analysis,” Bioinformatics, vol. 20, pp. 2694-2701, 2004.
[23] O. Muralidharan, “An Empirical Bayes Mixture Method for Effect Size and False Discovery Rate Estimation,” Annals of Applied Statistics, vol. 4, pp. 422-438, 2010.
[24] B. Efron, “Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis,” J. Am. Statistical Assoc., vol. 99, pp. 96-104, 2004.
[25] B. Efron, “Correlation and Large-Scale Simultaneous Significance Testing,” J. Am. Statistical Assoc., vol. 102, pp. 93-103, 2007.
[26] J. Aubert, A. Bar-Hen, J. Daudin, and S. Robin, “Determination of the Differentially Expressed Genes in Microarray Experiments Using Local FDR,” BMC Bioinformatics, vol. 5, article 125, 2004.
[27] C.M.T. Greenwood, J. Rangrej, and L. Sun, “Optimal Selection of Markers for Validation or Replication from Genome-Wide Association Studies,” Genetic Epidemiology, vol. 31, pp. 395-407, 2007.
[28] Y. Pawitan, K. Murthy, S. Michiels, and A. Ploner, “Bias in the Estimation of False Discovery Rate in Microarray Studies,” Bioinformatics, vol. 21, pp. 3865-3872, 2005.
[29] J.D. Blume, “Likelihood Methods for Measuring Statistical Evidence,” Statistics in Medicine, vol. 21, pp. 2563-2599, 2002.
[30] I. Hacking, Logic of Statistical Inference. Cambridge Univ. Press, 1965.
[31] A.W.F. Edwards, “Statistical Methods in Scientific Inference,” Nature, vol. 222, pp. 1233-1237, 1969.
[32] R. Royall, Statistical Evidence: A Likelihood Paradigm. CRC Press, 1997.
[33] D.R. Bickel, “The Strength of Statistical Evidence for Composite Hypotheses: Inference to the Best Explanation,” Statistica Sinica, vol. 22, pp. 1147-1198, 2012.
[34] D.R. Bickel, “Minimax-Optimal Strength of Statistical Evidence for a Composite Alternative Hypothesis,” to appear in Int'l Statistical Rev.
[35] D.R. Bickel, “A Predictive Approach to Measuring the Strength of Statistical Evidence for Single and Multiple Comparisons,” Canadian J. Statistics, vol. 39, pp. 610-631, 2011.
[36] P.D. Grünwald, The Minimum Description Length Principle. MIT Press, 2007.
[37] M. Padilla and D.R. Bickel, “Empirical Bayes Methods Corrected for Small Numbers of Tests,” Statistical Applications in Genetics and Molecular Biology, vol. 11, no. 5, p. art. 4, 2012.
[38] D.W. Hosmer and S. Lemeshow, Applied Logistic Regression. John Wiley and Sons, 2000.
[39] G. Shieh, “On Power and Sample Size Calculations for Wald Tests in Generalized Linear Models,” J. Statistical Planning and Inference, vol. 128, no. 1, pp. 43-59, 2005.
[40] Z. Yang, Z. Li, and D.R. Bickel, “Empirical Bayes Estimation of Posterior Probabilities of Enrichment: A comparative study of five estimators of the local false discovery rate,” BMC Bioinformatics, vol. 14, article 87, 2013.
[41] D.R. Bickel, “Empirical Bayes Interval Estimates That Are Conditionally Equal to Unadjusted Confidence Intervals or to Default Prior Credibility Intervals,” Statistical Applications in Genetics and Molecular Biology, vol. 11, no. 3, p. 7, 2012.
[42] S. Kullback, Information Theory and Statistics. Dover, 1968.
[43] S. Purcell, B. Neale, K. Todd-Brown, L. Thomas, and M. Ferreira, “PLINK: A Toolset for Whole-Genome Association and Population-Based Linkage Analysis,” Am. J. Human Genetics, vol. 81, pp. 559-575, 2007.
[44] J.M. Bernardo and A.F.M. Smith, Bayesian Theory. John Wiley and Sons, 1994.
[45] F. Lad, Operational Subjective Statistical Methods: A Mathematical, Philosophical, and Historical Introduction. Wiley-Interscience, 1996.
[46] J.T. Hwang, G. Casella, C. Robert, M.T. Wells, and R.H. Farrell, “Estimation of Accuracy in Testing,” Annals of Statistics, vol. 20, pp. 490-509, 1992.
[47] D.R. Bickel, “Small-Scale Inference: Empirical Bayes and Confidence Methods for as Few as a Single Comparison,” arXiv:1104.0341, 2011.
[48] Z. Yang and D.R. Bickel, “A Restricted Empirical Bayes Approach to Detecting Genetic Association,” presented at the Joint Statistical Meetings, July/Aug. 2011.
[49] G. Gibson, “Hints of Hidden Heritability in GWAS,” Nature Genetics, vol. 42, pp. 558-560, 2010.
[50] J.-H. Park, S. Wacholder, M.H. Gail, U. Peters, K.B. Jacobs, S.J. Chanock, and N. Chatterjee, “Estimation of Effect Size Distribution from Genome-Wide Association Studies and Implications for Future Discoveries,” Nature Genetics, vol. 42, pp. 570-575, 2010.
[51] J.O. Berger and T. Sellke, “Testing a Point Null Hypothesis: The Irreconcilability of p Values and Evidence,” J. Am. Statistical Assoc., vol. 82, pp. 112-122, 1987.
[52] R.E. Kass and A.E. Raftery, “Bayes Factors,” J. Am. Statistical Assoc., vol. 90, pp. 773-795, 1995.
[53] D.R. Bickel, “Game-Theoretic Probability Combination with Applications to Resolving Conflicts between Statistical Methods,” Int'l J. Approximate Reasoning, vol. 53, pp. 880-891, 2012.
[54] R.C. Gentleman et al., “Bioconductor: Open Software Development for Computational Biology and Bioinformatics,” Genome Biology, vol. 5, p. R80, 2004.
[55] B. Efron, “Size, Power and False Discovery Rates,” Annals of Statistics, vol. 35, pp. 1351-1377, 2007.
[56] R Development Core Team, “R: A Language and Environment for Statistical Computing,” 2007.
54 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool