The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - October-December (2010 vol.7)
pp: 719-726
Xin Lu , University of California, San Diego, CA
Anthony Gamst , University of California, San Diego, CA
Ronghui Xu , University of California, San Diego, CA
ABSTRACT
Great concerns have been raised about the reproducibility of gene signatures based on high-throughput techniques such as microarray. Studies analyzing similar samples often report poorly overlapping results, and the p-value usually lacks biological context. We propose a nonparametric ReDiscovery Curve (RDCurve) method, to estimate the frequency of rediscovery of gene signature identified. Given a ranking procedure and a data set with replicated measurements, the RDCurve bootstraps the data set and repeatedly applies the ranking procedure, selects a subset of k important genes, and estimates the probability of rediscovery of the selected subset of genes. We also propose a permutation scheme to estimate the confidence band under the Null hypothesis for the significance of the RDCurve. The method is nonparametric and model-independent. With the RDCurve, we can assess the signal-to-noise ratio of the data, compare the performance of ranking procedures in term of their expected rediscovery rates, and choose the number of genes to be reported.
INDEX TERMS
Biology and genetics, bootstrap, nonparametric statistics, gene ranking, reproducibility.
CITATION
Xin Lu, Anthony Gamst, Ronghui Xu, "RDCurve: A Nonparametric Method to Evaluate the Stability of Ranking Procedures", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.7, no. 4, pp. 719-726, October-December 2010, doi:10.1109/TCBB.2008.138
REFERENCES
[1] L. Shi et al., "The MicroArray Quality Control (MAQC) Project Shows Inter- and Intraplatform Reproducibility of Gene Expression Measurements," Nature Biotechnology, vol. 24, no. 9, pp. 1151-1161, 2006.
[2] P.K. Tan, T.J. Downey, E.L. Spitznagel,Jr., P. Xu, D. Fu, D.S. Dimitrov, R.A. Lempicki, B.M. Raaka, and M.C. Cam, "Evaluation of Gene Expression Measurements from Commercial Microarray Platforms," Nucleic Acids Research, vol. 31, no. 19, pp. 5676-5684, 2003.
[3] E. Marshall, "Getting the Noise Out of Gene Arrays," Science, vol. 306, no. 5696, pp. 630-631, 2004.
[4] R.D. Canales et al., "Evaluation of DNA Microarray Results with Quantitative Gene Expression Platforms," Nature Biotechnology, vol. 24, no. 9, pp. 1115-1122, 2006.
[5] L. Guo et al., "Rat Toxicogenomic Study Reveals Analytical Consistency across Microarray Platforms," Nature Biotechnology, vol. 24, no. 9, pp. 1162-1169, 2006.
[6] V.G. Tusher, R. Tibshirani, and G. Chu, "Significance Analysis of Microarrays Applied to the Ionizing Radiation Response," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 9, pp. 5116-5121, 2001.
[7] G. Lin, X. He, H. Ji, L. Shi, R.W. Davis, and S. Zhong, "Reproducibility Probability Score-Incorporating Measurement Variability across Laboratories for Gene Selection," Nature Biotechnology, vol. 24, no. 12, pp. 1476-1477, 2006.
[8] Y. Benjamini and Y. Hochberg, "Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing," J. Royal Statistical Soc.: Series B, vol. 57, pp. 289-300, 1995.
[9] Y. Benjamini and D. Yekutieli, "The Control of the False Discovery Rate in Multiple Testing under Dependency," The Annals of Statistics, vol. 29, pp. 1165-1188, 2001.
[10] J.D. Storey and R. Tibshirani, "Statistical Significance for Genomewide Studies," Proc. Nat'l Academy of Science USA, vol. 100, pp. 9440-9445, 2003.
[11] B. Wu, Z. Guan, and H. Zhao, "Parametric and Nonparametric FDR Estimation Revisited," Biometrics, vol. 62, no. 3, pp. 735-744, 2006.
[12] B. Efron, R. Tibshirani, J.D. Storey, and V. Tusher, "Empirical Bayes Analysis of a Microarray Experiment," J. Am. Statistical Assoc., vol. 96, pp. 1151-1160, 2001.
[13] X. Lu and D.L. Perkins, "Re-Sampling Strategy to Improve the Estimation of Number of Null Hypotheses in FDR Control under Strong Correlation Structures," BMC Bioinformatics, vol. 8, article no. 157, 2007.
[14] A. Gordon, G. Glazko, X. Qiu, and A. Yakovlev, "Control of the Mean Number of False Discoveries, Bonferroni and Stability of Multiple Testing," Annals of Applied Statistics, vol. 1, no. 1, pp. 179-190, 2007.
[15] X. Qiu, Y. Xiao, A. Gordon, and A. Yakovlev, "Assessing Stability of Gene Selection in Microarray Data Analysis," BMC Bioinformatics, vol. 7, article no. 50, 2006.
[16] X. Qiu and A. Yakovlev, "Some Comments on Instability of False Discovery Rate Estimation," J. Bioinformatics and Computational Biology, vol. 4, no. 5, pp. 1057-1068, 2006.
[17] A.L. Richardson, Z.C. Wang, A. De Nicolo, X. Lu, M. Brown, A. Miron, X. Liao, J.D. Iglehart, D.M. Livingston, and S. Ganesan, "X Chromosomal Abnormalities in Basal-Like Human Breast Cancer," Cancer Cell, vol. 9, no. 2, pp. 121-132, 2006.
[18] R. Xu and X. Li, "A Comparison of Parametric versus Permutation Methods with Applications to General and Temporal Microarray Gene Expression Data," Bioinformatics, vol. 19, no. 10, pp. 1284-1289, 2003.
[19] L. Breiman, "Random Forest," Machine Learning, vol. 45, pp. 5-32, 2001.
[20] X. Zhang, X. Lu, Q. Shi, X.Q. Xu, H.C. Leung, L.N. Harris, J.D. Iglehart, A. Miron, J.S. Liu, and W.H. Wong, "Recursive SVM Feature Selection and Sample Classification for Mass-Spectrometry and Microarray Data," BMC Bioinformatics, vol. 7, article no. 197, 2006.
[21] Y. Barash, E. Dehan, M. Krupsky, W. Franklin, M. Geraci, N. Friedman, and N. Kaminski, "Comparative Analysis of Algorithms for Signal Quantization from Oligonucleotide Microarrays," Bioinformatics, vol. 20, no. 6, pp. 839-846, 2004.
[22] Z. Wu and R.A. Irizarry, "Preprocessing of Oligonucleotide Array Data," Nature Biotechnology, vol. 22, no. 6, pp. 656-658, 2004.
[23] C.M. Perou et al., "Molecular Portraits of Human Breast Tumours," Nature, vol. 406, no. 6797, pp. 747-752, 2000.
[24] T. Sorlie et al., "Gene Expression Patterns of Breast Carcinomas Distinguish Tumor Subclasses with Clinical Implications," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 19, pp. 10869-10874, 2001.
[25] A. Barth, P.H. Craig, and M.J. Silverstein, "Predictors of Axillary Lymph Node Metastases in Patients with T1 Breast Carcinoma," Cancer, vol. 79, no. 10, pp. 1918-1922, 1997.
[26] M.J. Silverstein, K.A. Skinner, and T.J. Lomis, "Predicting Axillary Nodal Positivity in 2282 Patients with Breast Carcinoma," World J. Surgery, vol. 25, no. 6, pp. 767-772, 2001.
[27] C. Yiangou, S. Shousha, and H.D. Sinnett, "Primary Tumour Characteristics and Axillary Lymph Node Status in Breast Cancer," British J. Cancer, vol. 80, no. 12, pp. 1974-1978, 1999.
[28] X.S. Lu, X. Lu, Z.C. Wang, J.D. Iglehart, X. Zhang, and A.L. Richardson, "Predicting Features of Breast Cancer with Gene Expression Patterns," Breast Cancer Research and Treatment, vol. 108, no. 2, pp. 191-201, 2008, DOI 10.1007/s10549-10007-19596-10546.
[29] X. Qiu, L. Klebanov, and A. Yakovlev, "Correlation between Gene Expression Levels and Limitations of the Empirical Bayes Methodology for Finding Differentially Expressed Genes," Statistical Applications in Genetics and Molecular Biology, vol. 4, article no. 34, 2005.
[30] U. Braga-Neto, R. Hashimoto, E.R. Dougherty, D.V. Nguyen, and R.J. Carroll, "Is Cross-Validation Better than Resubstitution for Ranking Genes?" Bioinformatics, vol. 20, no. 2, pp. 253-258, 2004.
[31] C. Zhang, X. Lu, and X. Zhang, "Significance of Gene Ranking for Classification of Microarray Samples," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 3, no. 3, pp. 312-320, July-Sept. 2006.
44 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool