This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Debiasing Training Data for Inductive Expert System Construction
May/June 2001 (vol. 13 no. 3)
pp. 497-512

Abstract—We study the presence of economic bias in the training data used to develop inductive expert systems. Such bias arises when an expert considers economic factors in decision making. We find that the presence of economic bias is particularly harmful when there is an economic misalignment between the expert and the user of the induced expert system. Such misalignment is referred to as differential bias. The most significant contribution of this study is a training data debiasing procedure that uses a genetic algorithm to reconstruct training data that is relatively free of economic bias. We conduct a series of simulation experiments that show: 1) the economic performance of accuracy and value seeking algorithms is statistically the same when the training data has economic bias, 2) both accuracy and value seeking algorithms suffer in the presence of differential bias, 3) the proposed debiasing procedure significantly combats differential bias, and 4) the debiasing procedure is quite robust with respect to estimation errors in its input parameters.

[1] D. Angulin and P. Laird, “Learning from Noisy Examples,” Machine Learning, vol. 2, pp. 343-370, 1988.
[2] W. Arthur, “Inductive Reasoning and Bounded Rationality,” American Economic Rev., vol. 84, May 1994.
[3] T. Back, Evolutionary Algorithms in Theory and Practice. Oxford Univ. Press, 1996.
[4] N. Ben, J. Montias, and E. Neuberger, “Basic Issues in Organizations: A Comparative Perspective,” J. Comparative Economics, vol. 17, no. 2, pp. 207-236, June 1993.
[5] H. Bisson, “Evaluation of Learning Systems: An Artificial Data-Based Approach,” Proc. European Working Session on Learning, Y. Kodratoff, ed., 1991.
[6] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. Belmont, Calif.: Wadsworth Publishing, 1984.
[7] R. Clemen and R. Winkler, “Unanimity and Compromise among Probability Forecasters,” Management Science, vol. 36, no. 7, pp. 767-79, 1990.
[8] R.H. Creecy et al., "Trading MIPS and Memory for Knowledge Engineering," Comm. ACM, Vol. 35, No. 8, Aug. 1992, pp. 48-63.
[9] R. Dye, “Accounting Standards, Legal Liability, and Auditor Wealth,” J. Political Economy, vol. 101, no. 5, pp. 887-915, 1993.
[10] D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, Mass.: Addison-Wesley, 1989.
[11] J. Gray, “Reforms to Improve Client Protection and Compensation Against Personal Financial Planners' Unethical Business Practices,” Am. Business Law J., vol. 32, pp. 245-276, Sept. 1994.
[12] K. Hall, J. Moore, and A. Whinston, “A Theoretical Basis for Expert Systems,” Artificial Intelligence in Economics and Management, L.F. Pau, ed., North-Holland: Elsevier Science, 1986.
[13] D. Hart, “The Role of Induction in Knowledge Elicitation,” Expert Systems, vol. 2, pp. 24-28, 1985.
[14] D. Hart, Knowledge Acquisition for Expert Systems. London: Kogan Page, 1986.
[15] K. Irani, J. Cheng, U. Fayyad, and Z. Qian, "Applying Machine Learning to Semiconductor Manufacturing," IEEE Expert, vol. 8, no. 1, pp. 41-47, Feb. 1993.
[16] V. Jacob, L. Gaultney, and G. Salvendy, “Strategies and Biases in Human Decision Making and their Implications for Expert Systems,” Behavior and Information Technology, vol. 5, no. 2, pp. 119-140, 1986.
[17] P. Jesilow, G. Geis, and H. Pontell, “Fraud by Physicians Against Medicaid,” JAMA, vol. 266, no. 23, pp. 3318-3322, Dec. 1991.
[18] Judgment Under Uncertainty: Heuristics and Biases. D. Kahneman, P. Slovic, and P. Tverskyeds., eds., New York: Cambridge Univ. Press, 1982.
[19] J. Kim and H. Myung, “Evolutionary Programming Techniques for Constrained Optimization Problems,” IEEE Trans. Evolutionary Computation, vol. 1, no. 2, pp. 129-140, July 1997.
[20] T. Liang, "A Composite Approach to Inducing Knowledge for Expert System Design," Management Science, vol. 38, no. 1, pp. 1-17, 1992.
[21] B. Lipman, “Information Processing and Bounded Rationality: A Survey,” Canadian J. Economics, vol. 28, no. 1, Feb. 1995.
[22] B. Mendel and T. Sheridan, “Filtering Information From Human Experts,” IEEE Trans. Systems, Man and Cybernetics, vol. 19, no. 1, pp. 6-16, Jan.-Feb. 1989.
[23] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, third ed. Heiderberg: Springer-Verlag, 1996.
[24] R. Michalski and C. Chilausky, “Learning by Being Told and Learning from Examples: An Experimental Comparison of Two Methods of Knowledge Acquisition in the Context of Building an Expert System for Soybean Disease Diagnosis,” Int'l J. Policy Analysis and Information Systems, vol. 4, pp. 125-161, 1980.
[25] V. Mookerjee and B. Dos Santos, “Inductive Expert System Design Maximizing System Value,” Information Systems Research, vol. 4, no. 2, 1993.
[26] V. Mookerjee, M. Mannino, and R. Gilson, “Improving the Performance Stability of Inductive Expert Systems Under Input Noise,” Information Systems Research, vol. 6, no. 4, pp. 328-356, Dec. 1995.
[27] J. Moore and A. Whinston, "A Model of Decision Making with Sequential Information Acquisition—Part I," Decision Support Systems, vol. 2, no. 4, NorthHolland, pp. 285-307, 1986.
[28] J. Moore and A. Whinston, "A Model of Decision Making with Sequential Information Acquisition—Part II," Decision Support Systems, vol. 3, no. 1, NorthHolland, pp. 47-72, 1987.
[29] M. Moulet, “Using Accuracy in Scientific Discovery,” Proc. European Working Session on Learning, Y. Kodratoff, ed., 1991.
[30] P. Murphy and D. Aha, UCI Repository of Machine Learning Databases. Univ. of California, Dept. of Information and Computer Science, Irvine, Calif., 1991, http://www.ics.uci.edu/~mlearnMLRepository.html .
[31] C. Oreilly, “The Use of Information in Organizational Decision Making: A Model and Some Propositions,” Research in Organizational Behavior, vol. 5, pp. 103-139, 1983.
[32] F. Provost, “Goal-Directed Inductive Learning: Trading Off Accuracy for Reduced Error Cost,” Proc. AAAI Spring Symp. Goal-Driven Learning, 1994.
[33] J.R. Quinlan,“Simplifying decision trees,” Int’l J. Man-Machine Studies, vol. 27, pp. 221-234, 1987.
[34] J.R. Quinlan,"Induction of decision trees," Machine Learning, vol. 1, pp. 81-106, 1986.
[35] J. Quinlan, “The Effect of Noise on Concept Learning,” Machine Learning, R. Michalski, J. Carbonnell, and T. Mitchel, eds., vol. 2, 1986.
[36] L.A. Rendell and H. Cho,“Empirical learning as a function of concept character,” Machine Learning, vol. 5, no. 3, pp. 267-298, 1990.
[37] B.G. Silverman,“Critiquing human judgment using knowledge-acquisition systems,” AI Magazine, pp. 60-79, Fall 1990.
[38] H. Simon, “A Behavioral Model of Rational Choice,” Quarterly J. Economics, vol. 69, pp. 99-118, 1955.
[39] R. Sloan, “Four Types of Noise in Data for PAC Learning,” Information Processing Letters, vol. 54, no. 3, pp. 157-162, May 1995.
[40] K. Tam and M. Kiang, “Predicting Bank Failures: A Neural Network Approach,” Applied Artificial Intelligence, vol. 4, pp. 265-282, 1990.
[41] M. Tan, “Cost-Sensitive Learning of Classification Knowledge and Its Applications in Robotics,” Machine Learning, vol. 13, pp. 7-33, 1993.
[42] P. Turney, “Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm,” J. Artificial Intelligence Research, vol. 2, pp. 369-409, 1995.
[43] L. Valiant, “Learning Disjunctions of Conjunctions,” Proc. Ninth Int'l Joint Conf. Artificial Intelligence, pp. 560-566, 1985.
[44] M. Wagner, “Banc One checks out Web: Check Fraud, Account Errors Targeted with Verification Service,” Computerworld, vol. 30, no. 35, p. 69, Aug. 1996.
[45] S. Weiss and C. Kulikowski, Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems, Morgan Kaufmann, 1991.

Index Terms:
Inductive system design, expert bias, sequential decision making.
Citation:
Vijay S. Mookerjee, "Debiasing Training Data for Inductive Expert System Construction," IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 3, pp. 497-512, May-June 2001, doi:10.1109/69.929904
Usage of this product signifies your acceptance of the Terms of Use.