This Article 
 Bibliographic References 
 Add to: 
Test Strategies for Cost-Sensitive Decision Trees
August 2006 (vol. 18 no. 8)
pp. 1055-1067
In medical diagnosis, doctors must often determine what medical tests (e.g., X-ray and blood tests) should be ordered for a patient to minimize the total cost of medical tests and misdiagnosis. In this paper, we design cost-sensitive machine learning algorithms to model this learning and diagnosis process. Medical tests are like attributes in machine learning whose values may be obtained at a cost (attribute cost), and misdiagnoses are like misclassifications which may also incur a cost (misclassification cost). We first propose a lazy decision tree learning algorithm that minimizes the sum of attribute costs and misclassification costs. Then, we design several novel "test strategies” that can request to obtain values of unknown attributes at a cost (similar to doctors' ordering of medical tests at a cost) in order to minimize the total cost for test examples (new patients). These test strategies correspond to different situations in real-world diagnoses. We empirically evaluate these test strategies, and show that they are effective and outperform previous methods. Our results can be readily applied to real-world diagnosis tasks. A case study on heart disease is given throughout the paper.

[1] C.L. Blake and C.J. Merz, UCI Repository of Machine Learning Databases (Web Site), Dept. of Information and Computer Science, Univ. of California, Irvine, 1998.
[2] X. Chai, L. Deng, Q. Yang, and C.X. Ling, “Test-Cost Sensitive Naïve Bayesian Classification,” Proc. Fourth IEEE Int'l Conf. Data Mining, 2004.
[3] P. Domingos, “MetaCost: A General Method for Making Classifiers Cost-Sensitive,” Proc. Fifth Int'l Conf. Knowledge Discovery and Data Mining, pp. 155-164, 1999.
[4] C. Elkan, “The Foundations of Cost-Sensitive Learning,” Proc. 17th Int'l Joint Conf. Artificial Intelligence, pp. 973-978, 2001.
[5] U.M. Fayyad and K.B. Irani, “Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning,” Proc. 13th Int'l Joint Conf. Artificial Intelligence, pp. 1022-1027, 1993.
[6] J. Friedman, Y. Yun, and R. Kohavi, “Lazy Decision Trees,” Proc. 13th Nat'l Conf. Artificial Intelligence, 1996.
[7] G. Gorry and G. Barnett, “Experience with a Model of Sequential Diagnosis,” Computers and Biomedical Research, 1968.
[8] C.X. Ling, Q. Yang, J. Wang, and S. Zhang, “Decision Trees with Minimal Costs,” Proc. 21st Int'l Conf. Machine Learning, 2004.
[9] D. Lizotte, O. Madani, and R. Greiner, “Budgeted Learning of Naïve-Bayes Classifiers,” Proc. 19th Conf. Uncertainty in Artificial Intelligence, 2003.
[10] P. Melville, M. Saar-Tsechansky, F. Provost, and R.J. Mooney, “Active Feature Acquisition for Classifier Induction,” Proc. Fourth Int'l Conf. Data Mining, 2004.
[11] P. Melville, M. Saar-Tsechansky, F. Provost, and R.J. Mooney, “Economical Active Feature-Value Acquisition through Expected Utility Estimation,” Proc. Workshop Utility-Based Data Mining, 2004.
[12] M. Nunez, “The Use of Background Knowledge in Decision Tree Induction,” Machine Learning, vol. 6, pp. 231-250, 1991.
[13] C4.5: Programs for Machine Learning, J.R. Quinlan, ed. Morgan Kaufmann, 1993.
[14] V.S. Sheng, C.X. Ling, A. Ni, and S. Zhang, “Cost-Sensitive Test Strategies,” Proc. 21st Nat'l Conf. Artificial Intelligence, 2006.
[15] M. Tan, “Cost-Sensitive Learning of Classification Knowledge and Its Applications in Robotics,” Machine Learning J., vol. 13, pp. 7-33, 1993.
[16] K.M. Ting, “Inducing Cost-Sensitive Trees via Instance Weighting,” Proc. Second European Symp. Principles of Data Mining and Knowledge Discovery, pp. 23-26, 1998.
[17] P.D. Turney, “Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm,” J. Artificial Intelligence Research, vol. 2, pp. 369-409, 1995.
[18] P.D. Turney, “Types of Cost in Inductive Concept Learning,” Proc. Workshop Cost-Sensitive Learning, 17th Int'l Conf. Machine Learning, 2000.
[19] B. Zadrozny and C. Elkan, “Learning and Making Decisions When Costs and Probabilities are Both Unknown,” Proc. Seventh Int'l Conf. Knowledge Discovery and Data Mining, pp. 204-213, 2001.
[20] V.B. Zubek and T. Dietterich, “Pruning Improves Heuristic Search for Cost-Sensitive Learning,” Proc. 19th Int'l Conf. Machine Learning, pp. 27-35, 2002.

Index Terms:
Induction, concept learning, mining methods and algorithms, classification.
Charles X. Ling, Victor S. Sheng, Qiang Yang, "Test Strategies for Cost-Sensitive Decision Trees," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 8, pp. 1055-1067, Aug. 2006, doi:10.1109/TKDE.2006.131
Usage of this product signifies your acceptance of the Terms of Use.