This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Development of a Bayesian Framework for Determining Uncertainty in Receiver Operating Characteristic Curve Estimates
January 2010 (vol. 22 no. 1)
pp. 31-45
David R. Parker, Pacific Air Forces, Wright-Patterson AFB
Steven C. Gustafson, Air Force Institute of Technology, Wright-Patterson AFB
Mark E. Oxley, Air Force Institute of Technology, Wright-Patterson AFB
Timothy D. Ross, Air Force Research Laboratory Sensors Directorate, AFRL COMPASE Center, Wright-Patterson AFB
This research uses a Bayesian framework to develop probability densities for the receiver operating characteristic (ROC) curve. The ROC curve is a discrimination metric that may be used to quantify how well a detection system classifies targets and nontargets. The degree of uncertainty in ROC curve formulation is a concern that previous research has not adequately addressed. This research formulates a probability density for the ROC curve and characterizes its uncertainty using confidence bands. Methods for the generation and characterization of the probability densities of the ROC curve are specified and demonstrated, where the initial analysis employs beta densities to model target and nontarget samples of detection system output. For given target and nontarget data, given functional forms of the data densities (such as beta density forms) and given prior densities of the form parameters, the methods developed here provide exact performance metric probability densities.

[1] T. Fawcett, “An Introduction to ROC Analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861-874, 2006.
[2] J. Swets, “Measuring the Accuracy of Diagnostic Systems,” Science, vol. 240, pp. 1285-1293, 1988.
[3] T. Wickens, Elementary Signal Detection Theory. Oxford Univ. Press, 2002.
[4] J. Hanley, “Receiver Operating Characteristic (ROC) Curves,” Encyclopedia of Biostatistics, P. Armitage and T. Colton, eds., pp.3738-3745, Wiley, 1999.
[5] S. Macskassy and F. Provost, “Confidence Bands for ROC Curves: Methods and an Empirical Study,” Proc. First Workshop ROC Analysis in AI (ROCAI '04) at ECAI-2004, 2004.
[6] S. Alsing, “The Evaluation of Competing Classifiers,” PhD dissertation, Air Force Inst. of Tech nology, 2000.
[7] T. Ross and M. Minardi, “Discrimination and Confidence Error in Detector Reported Scores,” Proc. SPIE Conf. Algorithms for Synthetic Aperture Radar Imagery XI, pp. 342-353, 2004.
[8] D. Parker, S. Gustafson, and T. Ross, “Probability Densities and Confidence Intervals for Target Recognition Performance Metrics,” Proc. SPIE Conf. Algorithms for Synthetic Aperture Radar Imagery XII, pp. 373-382, May 2005.
[9] J. Olmstead, Advanced Calculus. Appleton-Century-Crofts, 1961.
[10] C. Metz, B. Herman, and J. Shen, “Maximum Likelihood Estimation of Receiver Operating Characteristic (ROC) Curves from Continuously-Distributed Data,” Statistics in Medicine, vol. 17, no. 9, pp. 1033-1053, 1998.
[11] D. Dorfman, K. Berbaum, C. Metz, R. Lenth, J. Hanley, and H. Dagga, “Proper Receiver Operating Characteristic Analysis: The Bigamma Model,” Academic Radiology, vol. 4, no. 2, pp. 138-149, Feb. 1997.
[12] N. Obuchowski and M. Lieber, “Confidence Intervals for the Receiver Operating Characteristic Area in Studies with Small Samples,” Academic Radiology, vol. 5, pp. 561-571, 1998.
[13] G. Ma and W. Hall, “Confidence Bands for Receiver Operating Characteristic Curves,” Medical Decision Making, vol. 13, pp. 191-197, 1993.
[14] G. Campbell, “Advances in Statistical Methodology for the Evaluation of Diagnostic and Laboratory Tests,” Statistics in Medicine, vol. 13, pp. 499-508, 1994.
[15] A. Garber, R. Olshen, H. Zhang, and E. Venkatraman, “Predicting High-Risk Cholesterol Levels,” Int'l Statistical Rev., vol. 62, pp. 203-228, 1994.
[16] K. Jensen, H.-H. Muller, and H. Schafer, “Regional Confidence Bands for ROC Curves,” Statistics in Medicine, vol. 19, pp. 493-509, 2000.
[17] D. Mossman, “Resampling Techniques in the Analysis of Non-Binormal ROC Data,” Medical Decision Making, vol. 15, pp. 358-366, 1995.
[18] R. Platt, J. Hanley, and H. Yang, “Bootstrap Confidence Intervals for the Sensitivity of a Quantitative Diagnostic Test,” Statistics in Medicine, vol. 19, pp. 313-322, 2000.
[19] E. Simpson, R. Ideker, K. Lee, and W. Smith, “Computing ROC Curve Confidence Intervals for Cardiac Activation Detectors,” Proc. 11th Ann. Int'l Conf. IEEE Eng. Medicine and Biology Soc., 1989.
[20] X.-H. Zhou and G. Qin, “Improved Confidence Intervals for the Sensitivity at a Fixed Level of Specificity of a Continuous-Scale Diagnostic Test,” Statistics in Medicine, vol. 24, pp. 465-477, 2005.
[21] B. Efron and R. Tibshirani, An Introduction to the Bootstrap. Chapman and Hall, 1993.
[22] D. Parker, “Uncertainty Estimation for Target Detection System Discrimination and Confidence Performance Metrics,” PhD dissertation, Air Force Inst. of Tech nology, 2006.
[23] C. Lloyd, “Estimation of a Convex ROC Curve,” Statistics and Probability Letters, vol. 59, pp. 99-111, 2002.
[24] P. Qiu and C. Le, “ROC Curve Estimation Based on Local Smoothing,” J. Statistical Computation and Simulation, vol. 70, pp. 55-69, 2001.
[25] S. Macskassy, F. Provost, and S. Rosset, “Pointwise ROC Confidence Bounds: An Empirical Evaluation,” Proc. Int'l Conf. Machine Learning (ICML '05) Workshop ROC Analysis in Machine Learning, 2005.
[26] R. Hilgers, “Distribution-Free Confidence-Bounds for ROC Curves,” Methods of Information in Medicine, vol. 30, no. 2, pp. 96-101, Apr. 1991.
[27] J. Kerekes, “Receiver Operating Characteristic Curve Confidence Intervals and Regions,” IEEE Geoscience and Remote Sensing Letters, vol. 5, no. 2, pp. 251-255, 2008.
[28] P. Hall, R. Hyndman, and Y. Fan, “Nonparametric Confidence Intervals for Receiver Operating Characteristic Curves,” Biometrika, vol. 91, pp. 743-750, 2004.
[29] J. Carlin, “Meta-Analysis for 2 × 2 Tables: A Bayesian Approach,” Statistics in Medicine, vol. 11, pp. 141-158, 1992.
[30] V. Dukic and C. Gatsonis, “Meta-Analysis of Diagnostic Test Accuracy Assessment Studies with Varying Number of Thresholds,” Biometrics, vol. 49, pp. 936-946, 2003.
[31] M. Hellmich, K. Abrams, and A. Sutton, “Bayesian Approaches to Meta-Analysis of ROC Curves,” Medical Decision Making, vol. 19, pp. 252-264, 1999.
[32] C. Rutter and C. Gatsonis, “A Hierarchical Regression Approach to Meta-Analysis of Diagnostic Test Accuracy Evaluations,” Statistics in Medicine, vol. 20, pp. 2865-2884, 2001.
[33] T. Smith, D. Spiegelhalter, and A. Thomas, “Bayesian Approaches to Random-Effects Meta-Analysis: A Comparative Study,” Statistics in Medicine, vol. 14, pp. 2685-2699, 1995.
[34] X. Zhou, “Empirical Bayes Combination of Estimated Areas under ROC Curves Using Estimating Equations,” Medical Decision Making, vol. 16, pp. 24-28, 1996.
[35] L. Broemeling, “The Predictive Distribution and Area under the ROC Curve,” Technical Report 013-04, The Univ. of Texas M.D. Anderson Cancer Center, 2004.
[36] D. Parker, S. Gustafson, and T. Ross, “Bayesian Confidence Intervals for ROC Curves,” IEE Electronics Letters, vol. 41, pp. 279-280, 2005.
[37] B. Carlin and T. Louis, Bayes and Empirical Bayes Methods for Data Analysis. Chapman and Hall/CRC, 2000.
[38] D. MacKay, Information Theory, Inference, and Learning Algorithms. Cambridge Univ. Press, 2003.
[39] W. Bolstad, Introduction to Bayesian Statistics. Wiley, 2004.
[40] W. Mendenhall, D. Wackerly, and R. Scheaffer, Mathematical Statistics with Applications. PWS-Kent, 1990.
[41] J. Patel, C. Kapadia, and D. Owen, Handbook of Statistical Distributions. Marcel Dekker, 1976.
[42] A. Kagan, I. Linnik, and C. Rao, Characterization Problems in Mathematical Statistics. Wiley, 1973.
[43] G. Hahn and S. Shapiro, Statistical Models in Engineering. Wiley, 1967.
[44] D. MacKay, “Bayesian Methods for Adaptive Models,” PhD dissertation, California Inst. of Tech nology, 1992.
[45] D. MacKay, “Bayesian Interpolation,” Neural Computation, vol. 4, pp. 415-447, 1992.
[46] C. Bishop, Neural Networks for Pattern Recognition. Oxford Univ. Press, 1995.
[47] M. Clyde, “Bayesian Model Averaging and Model Search Strategies,” Bayesian Statistics 6, J. Berger, A. Dawid, and A.Smith, eds., pp. 157-185, Oxford Univ. Press, 1999.
[48] M. Clyde and E. George, “Model Uncertainty,” Statistical Science, vol. 19, pp. 81-94, 2004.
[49] J. Hoeting, D. Madigan, A. Raftery, and C. Volin, “Bayesian Model Averaging: A Tutorial,” Statistical Science, vol. 14, no. 4, pp. 382-417, http://www.stat.washington.edu/www/research/ onlinehoeting1999.pdf., 1999.
[50] M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul, Learning in Graphical Models, pp. 105-161. MIT Press, 1999.
[51] R. Larson, R. Hostetler, B. Edwards, and D. Heyd, Calculus with Analytic Geometry, seventh ed. Houghton Mifflin, 2002.
[52] A. Gelman, J. Carlin, H. Stern, and D. Rubin, Bayesian Data Analysis, second ed. Chapman and Hall, 2004.
[53] J. Hammersley and D. Handscomb, Monte Carlo Methods. Methuen, 1964.
[54] R. Kass and A. Raftery, “Bayes Factors,” J. Am. Statistical Assoc., vol. 90, pp. 773-795, 1995.
[55] G. Casella and R. Berger, Statistical Inference, second ed. Duxbury, 2002.
[56] P. Gregory, Bayesian Logical Data Analysis for the Physical Sciences: A Comparative Approach with Mathematica Support. Cambridge, 2005.
[57] H. Schafer, “Efficient Confidence Bounds for ROC Curves,” Statistics in Medicine, vol. 13, no. 15, pp. 1551-1561, 1994.
[58] J. Tilbury, P. VanEetvelt, J. Garibaldi, J. Curnow, and E. Ifeachor, “Receiver Operating Characteristic Analysis for Intelligent Medical Systems—a New Approach for Finding Confidence Intervals,” IEEE Trans. Biomedical Eng., vol. 47, no. 7, pp. 952-963, July 2000.
[59] J. Tilbury, “Evaluation of Intelligent Medical Systems,” PhD dissertation, Univ. of Plymouth, England, 2002.
[60] J. Tilbury, P.V. Eetvelt, J. Curnow, and E. Ifeachor, “Objective Evaluation of Intelligent Medical Systems Using a Bayesian Approach to Analysis of ROC Curves,” Proc. Fifth Int'l Conf. Neural Networks and Expert Systems in Medicine and Healthcare/Proc. First Int'l Conf. Computational Intelligence in Medicine and Healthcare, 2003.
[61] C. Bos, “A Comparison of Marginal Likelihood Computation Methods,” Tinbergen Inst. Discussion Paper, vol. 02-084/4, 2002.
[62] D. Madigan and A. Raftery, “Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window,” J. Am. Statistical Assoc., vol. 89, no. 428, pp. 1535-1546, 1994.
[63] A. Raftery, F. Balabdaoui, T. Gneiting, and M. Polakowski, “Using Bayesian Model Averaging to Calibrate Forecast Ensembles,” Technical Report 440, Univ. of Washington, Dec. 2003.
[64] K. Zou, W.W. III, R. Kikinis, and S. Warfield, “Three Validation Metrics for Automated Probabilistic Image Segmentation of Brain Tumors,” Statistics in Medicine, vol. 23, pp. 1259-1282, 2004.
[65] L. Dodd and M. Pepe, “Partial AUC Estimation and Regression,” Biometrics, vol. 59, pp. 614-623, 2003.
[66] D. Faraggi, “Adjusting Receiver Operating Characteristic Curves and Related Indices for Covariates,” Statistician, vol. 51, pp. 179-192, 2003.

Index Terms:
Performance evaluation, performance metrics, receiver operating characteristic, ROC curves, uncertainty estimation, target detection.
Citation:
David R. Parker, Steven C. Gustafson, Mark E. Oxley, Timothy D. Ross, "Development of a Bayesian Framework for Determining Uncertainty in Receiver Operating Characteristic Curve Estimates," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 1, pp. 31-45, Jan. 2010, doi:10.1109/TKDE.2009.50
Usage of this product signifies your acceptance of the Terms of Use.