Issue No. 05 - May (2006 vol. 18)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2006.84
Qiang Yang , IEEE
In the area of cost-sensitive learning, inductive learning algorithms have been extended to handle different types of costs to better represent misclassification errors. Most of the previous works have only focused on how to deal with misclassification costs. In this paper, we address the equally important issue of how to handle the test costs associated with querying the missing values in a test case. When an attribute contains a missing value in a test case, it may or may not be worthwhile to take the extra effort in order to obtain a value for that attribute, or attributes, depending on how much benefit the new value will bring about in increasing the accuracy. In this paper, we consider how to integrate test-cost-sensitive learning with the handling of missing values in a unified framework that includes model building and a testing strategy. The testing strategies determine which attributes to perform the test on in order to minimize the sum of the classification costs and test costs. We show how to instantiate this framework in two popular machine learning algorithms: decision trees and naive Bayesian method. We empirically evaluate the test-cost-sensitive methods for handling missing values on several data sets.
Cost-sensitive learning, decision trees, naive Bayes.
Q. Yang, X. Chai, C. Ling and R. Pan, "Test-Cost Sensitive Classification on Data with Missing Values," in IEEE Transactions on Knowledge & Data Engineering, vol. 18, no. , pp. 626-638, 2006.