This Article 
 Bibliographic References 
 Add to: 
A Case Study of Applying Boosting Naive Bayes to Claim Fraud Diagnosis
May 2004 (vol. 16 no. 5)
pp. 612-620

Abstract—In this paper, we apply the weight of evidence reformulation of AdaBoosted naive Bayes scoring due to Ridgeway et al. [38] to the problem of diagnosing insurance claim fraud. The method effectively combines the advantages of boosting and the explanatory power of the weight of evidence scoring framework. We present the results of an experimental evaluation with an emphasis on discriminatory power, ranking ability, and calibration of probability estimates. The data to which we apply the method consists of closed personal injury protection (PIP) automobile insurance claims from accidents that occurred in Massachusetts during 1993 and were previously investigated for suspicion of fraud by domain experts. The data mimic the most commonly occurring data configuration—that is, claim records consisting of information pertaining to several binary fraud indicators. The findings of the study reveal the method to be a valuable contribution to the design of intelligible, accountable, and efficient fraud detection support.

[1] E. Bauer and R. Kohavi, An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting and Variants Machine Learning, vol. 36, nos. 1-2, pp. 105-139, 1999.
[2] B. Becker, R. Kohavi, and D. Sommerfield, Visualizing the Simple Bayesian Classifier Information Visualization in Data Mining and Knowledge Discovery, U.M. Fayyad, G. Grinstein, and A. Wierse, eds., Morgan Kaufmann, 2001.
[3] P.N. Bennett, Assessing the Calibration of Naive Bayes' Posterior Estimates Technical Report CMU-CS-00-155, Carnegie Mellon Univ., School of Computer Science, Computer Science Dept., Pittsburgh, Penn., 2000.
[4] J.O. Berger, Statistical Decision Theory and Bayesian Analysis. Springer, 1993.
[5] J.M. Bernardo and A.F.M. Smith, Bayesian Theory. Wiley, 2001.
[6] C.M. Bishop, Neural Networks for Pattern Recognition. Oxford Univ. Press, 1995.
[7] L. Breiman, Bagging Predictors Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.
[8] L. Breiman, Arcing Classifiers (with discussion) Annals of Statistics, vol. 26, no. 3, pp. 801-849, 1998.
[9] Canadian Coalition against Insurance Fraud, Insurance Fraud http:/, 2002.
[10] Coalition against Insurance Fraud, Insurance Fraud: The Crime You Pay For http://www.insurancefraud.orgfraud_ backgrounder.htm , 2002.
[11] ComitéEuropéen des Assurances, The European Insurance Anti-Fraud Guide CEA Info Special Issue 4, Paris, France, May 1996.
[12] ComitéEuropéen des Assurances, The European Insurance Anti-Fraud Guide 1997 Update CEA Info Special Issue 5, Paris, France, Oct. 1997.
[13] J.B. Copas, Plotting p against x J. Royal Statistical Soc.: Applied Statistics, vol. 32, no. 1, pp. 25-31, 1983.
[14] J. Risk and Insurance, special issue on insurance fraud, vol. 69, no. 3, R.A. Derrig, ed., 2002.
[15] P. Domingos and M. Pazzani, On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Machine Learning, vol. 29, nos. 2-3, pp. 103-130, 1997.
[16] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification. Wiley, 2000.
[17] C. Elkan, Boosting and Naive Bayesian Learning Technical Report CS97-557, Univ. of California, Dept. Computer Science and Eng., San Diego, Calif., 1997.
[18] Y. Freund and R. E. Shapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting Proc. Second European Conf. Computational Learning Theory, Mar. 1995.
[19] Y. Freund and R.E. Shapire, Experiments with a New Boosting Algorithm Proc. 13th Int'l Conf. Machine Learning, July 1996.
[20] N. Friedman, D. Geiger, and M. Goldszmidt, Bayesian Network Classifiers Machine Learning, vol. 29, nos. 2-3, pp. 131-163, 1997.
[21] A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin, Bayesian Data Analysis. Chapman&Hall/CRC, 1995.
[22] I.J. Good, The Estimation of Probabilities: An Essay on Modern Bayesian Methods. MIT Press, 1965.
[23] I.J. Good, Weight of Evidence: A Brief Survey Proc. Second Valencia Int'l Meeting on Bayesian Statistics, Sept. 1983.
[24] D.J. Hand, Statistical Methods in Diagnosis Statistical Methods in Medical Research, vol. 1, no. 1, pp. 49-67, 1992.
[25] D.J. Hand, Construction and Assessment of Classification Rules. Wiley, 1997.
[26] D.J. Hand and K. Yu, Idiot's Bayes Not so Stupid after All? Int'l Statistical Rev., vol. 69, no. 3, pp. 385-398, 2001.
[27] J.A. Hanley and B.J. McNeil, The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve Radiology, vol. 143, no. 1, pp. 29-36, 1982.
[28] M.J. Kearns and U.M. Vazirani, An Introduction to Computational Learning Theory. MIT Press, 1994.
[29] R. Kohavi, B. Becker, and D. Sommerfield, Improving Simple Bayes Proc. Ninth European Conf. Machine Learning, Apr. 1997.
[30] D.J.C. MacKay, The Evidence Framework Applied to Classification Networks Neural Computation, vol. 4, no. 5, pp. 720-736, 1992.
[31] J.W. O'Kane, G. Ridgeway, and D. Madigan, Statistical Analysis of Clinical Variables to Predict the Outcome of Surgical Intervention in Patients with Knee Complaints technical report, , 1999.
[32] D. Opitz and R. Maclin, Popular Ensemble Methods: An Empirical Study J. Artificial Intelligence Research, vol. 11, pp. 169-198, 1999.
[33] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks for Plausible Inference. Morgan Kaufmann, 1988.
[34] F. Provost and T. Fawcett, Robust Classification for Imprecise Environments Machine Learning, vol. 42, no. 3, pp. 203-231, 2001.
[35] F. Provost, T. Fawcett, and R. Kohavi, The Case against Accuracy Estimation for Comparing Classifiers Proc. 15th Int'l Conf. Machine Learning, July 1998.
[36] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[37] G. Ridgeway, D. Madigan, and T. Richardson, Boosting Methodology for Regression Problems Proc. Seventh Int'l Workshop on Artificial Intelligence and Statistics, Jan. 1999.
[38] G. Ridgeway, D. Madigan, T. Richardson, and J.W. O'Kane, Interpretable Boosted Naive Bayes Classification Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining, Aug. 1998.
[39] R.E. Shapire, Y. Freund, P. Bartlett, and W.S. Lee, Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods Annals of Statistics, vol. 26, no. 5, pp. 1651-1686 1998.
[40] D.J. Spiegelhalter, Probabilistic Prediction in Patient Management and Clinical Trials Statistics in Medicine, vol. 5, no. 5, pp. 421-433, 1986.
[41] D.J. Spiegelhalter and R.P. Knill-Jones, Statistical and Knowledge-Based Approaches to Clinical Decision-Support Systems, with an Application in Gastroenterology (with discussion) J. Royal Statistical Soc.: Statistics in Soc., vol. 147, no. 1, pp. 35-77, 1984.
[42] J.A.K. Suykens and T. Van Gestel, J. De Brabanter, B. De Moor, and J. Vandewalle, Least Squares Support Vector Machines. World Scientific, 2002.
[43] J.A. Swets, ROC Analysis Applied to the Evaluation of Medical Imaging Techniques Investigative Radiology, vol. 14, no. 2, pp. 109-121, 1979.
[44] J.A. Swets and R.M. Pickett, Evaluation of Diagnostic Systems: Methods from Signal Detection Theory. Academic Press, 1982.
[45] D.M. Titterington, G.D. Murray, L.S. Murray, D.J. Spiegelhalter, A.M. Skene, J.D.F. Habbema, and G.J. Gelpke, Comparison of Discrimination Techniques Applied to a Complex Data Set of Head Injured Patients (with discussion) J. Royal Statistical Soc.: Statistics in Soc., vol. 144, no. 2, 145-175, 1981.
[46] S. Viaene, Learning to Detect Fraud from Enriched Insurance Claims Data: Context, Theory and Applications PhD thesis, K.U. Leuven, Dept. of Applied Economics, KBC Insurance Research Chair, Leuven, Belgium, 2002.
[47] S. Viaene, R.A. Derrig, B. Baesens, and G. Dedene, A Comparison of State-of-the-Art Classification Techniques for Expert Automobile Insurance Claim Fraud Detection J. Risk and Insurance, vol. 69, no. 3, pp. 373-421, 2002.
[48] G.I. Webb, MultiBoosting: A Technique for Combining Boosting and Wagging Machine Learning, vol. 40, no. 2, pp. 159-196, 2000.
[49] H.I. Weisberg and R.A. Derrig, Fraud and Automobile Insurance: A Report on the Baseline Study of Bodily Injury Claims in Massachusetts J. Insurance Regulation, vol. 9, no. 4, pp. 497-541, 1991.
[50] H.I. Weisberg and R.A. Derrig, Identification and Investigation of Suspicious Claims AIB Cost Containment/Fraud Filing DOI Docket R95-12 (IFFR-170), Automobile Insurers Bureau of Massachusetts, Boston, Mass., 1995.
[51] H.I. Weisberg and R.A. Derrig, Quantitative Methods for Detecting Fraudulent Automobile Bodily Injury Claims Risques, vol. 35, pp. 75-101, July-Sept. 1998.
[52] B. Zadrozny and C. Elkan, Learning and Making Decisions When Costs and Probabilities are Both Unknown Proc. Seventh ACM SIGKDD Conf. Knowledge Discovery in Data Mining, Aug. 2001.
[53] Z. Zheng, G.I. Webb, and K.M. Ting, Lazy Bayesian Rules: A Lazy Semi-Naive Bayesian Learning Technique Competitive to Boosting Decision Trees Proc. 16th Int'l Conf. Machine Learning, June 1999.

Index Terms:
Data mining, pattern recognition, classifier design and evaluation, claim fraud detection, knowledge discovery, decision support.
Stijn Viaene, Richard A. Derrig, Guido Dedene, "A Case Study of Applying Boosting Naive Bayes to Claim Fraud Diagnosis," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 5, pp. 612-620, May 2004, doi:10.1109/TKDE.2004.1277822
Usage of this product signifies your acceptance of the Terms of Use.