The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - September/October (2011 vol.8)
pp: 1431-1437
Ehud Aharoni , IBM Research Laboratory, Haifa
Hani Neuvirth , IBM Research Laboratory, Haifa
Saharon Rosset , Tel Aviv University, Tel Aviv
ABSTRACT
The common scenario in computational biology in which a community of researchers conduct multiple statistical tests on one shared database gives rise to the multiple hypothesis testing problem. Conventional procedures for solving this problem control the probability of false discovery by sacrificing some of the power of the tests. We suggest a scheme for controlling false discovery without any power loss by adding new samples for each use of the database and charging the user with the expenses. The crux of the scheme is a carefully crafted pricing system that fairly prices different user requests based on their demands while keeping the probability of false discovery bounded. We demonstrate this idea in the context of HIV treatment research, where multiple researchers conduct tests on a repository of HIV samples.
INDEX TERMS
Family-wise error rate, multiple comparisons, Bonferroni method.
CITATION
Ehud Aharoni, Hani Neuvirth, Saharon Rosset, "The Quality Preserving Database: A Computational Framework for Encouraging Collaboration, Enhancing Power and Controlling False Discovery", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 5, pp. 1431-1437, September/October 2011, doi:10.1109/TCBB.2010.105
REFERENCES
[1] S.-Y. Rhee, M.J. Gonzales, R. Kantor, B.J. Betts, J. Ravela, and R.W. Shafer, “Human Immunodeficiency Virus Reverse Transcriptase and Protease Sequence Database,” Nucleic Acids Research, vol. 31, no. 1, pp. 298-303, 2003.
[2] Wellcome Trust Case Control Consortium, “Genome-Wide Association Study of 14,000 Cases of Seven Common Diseases and 3,000 Shared Controls,” Nature, vol. 447, no. 7145, pp. 661-678, http://dx.doi.org/10.1038nature05911, June 2007.
[3] H. Neuvirth, U. Heinemann, D. Birnbaum, N. Tishby, and G. Schreiber, “Promateus—An Open Research Approach to Protein-Binding Sites Analysis,” Nucleic Acids Research, vol. 35, Web Server issue, pp. 543-548, 2007.
[4] R.J. Simes, “Publication Bias: The Case for an International Registry of Clinical Trials,” J. Clinical Oncology, vol. 4, pp. 1529-1541, 1986.
[5] Y. Hochberg and A.C. Tamhane, Multiple Comparison Procedures. Wiley, 1987.
[6] Y. Benjamini and Y. Hochberg, “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing,” J. Royal Statistical Soc. Series B, vol. 57, pp. 289-300, 1995.
[7] H. Tang, J. Peng, P. Wang, M. Coram, and L. Hsu, “Combining Multiple Family-Based Association Studies,” BMC Proc., vol. 1, no. Suppl 1, p. S162, http://www.biomedcentral.com/1753-6561/1/ S1S162, 2007.
[8] E.J.C.G. van den Oord and P.F. Sullivan, “False Discoveries and Models for Gene Discovery,” Trends in Genetics, vol. 19, no. 10, pp. 537-542, 2003.
[9] T.M. Cover and J.A. Thomas, Elements of Information Theory. Wiley-Interscience, Aug. 1991.
[10] M.H. DeGroot and M.J. Schervish, Probability and Statistics. Addison Wesley, 2002.
[11] D.L. Hawkins, “Using u Statistics to Derive the Asymptotic Distribution of Fisher's z Statistic,” The Am. Statistician, vol. 43, pp. 235-237, 1989.
[12] Y. Bao, P. Bolotov, D. Dernovoy, B. Kiryutin, L. Zaslavsky, T. Tatusova, J. Ostell, and D. Lipman, “The Influenza Virus Resource at the National Center for Biotechnology Information,” J. Virology, vol. 82, no. 2, pp. 596-601, http:/jvi.asm.org, 2008.
[13] J.P.A. Ioannidis, “Why Most Published Research Findings Are False,” PLoS Medicine, vol. 2, p. e124, 2005.
[14] R. Moonesinghe, M. Khoury, and A. Janssens, “Most Published Research Findings Are False—But a Little Replication Goes a Long Way,” PLoS Medicine, vol. 4, p. e28, 2007.
[15] D.P. Foster and R.A. Stine, “Alpha-Investing: A Procedure for Sequential Control of Expected False Discoveries,” J. Royal Statistical Soc.: Series B (Statistical Methodology), vol. 70, no. 2, pp. 429-444, http://dx.doi.org/10.1111j.1467-9868.2007.00643.x , Jan. 2008.
18 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool