Issue No.05 - May (2006 vol.18)
Qiang Yang , IEEE
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2006.84
In the area of cost-sensitive learning, inductive learning algorithms have been extended to handle different types of costs to better represent misclassification errors. Most of the previous works have only focused on how to deal with misclassification costs. In this paper, we address the equally important issue of how to handle the test costs associated with querying the missing values in a test case. When an attribute contains a missing value in a test case, it may or may not be worthwhile to take the extra effort in order to obtain a value for that attribute, or attributes, depending on how much benefit the new value will bring about in increasing the accuracy. In this paper, we consider how to integrate test-cost-sensitive learning with the handling of missing values in a unified framework that includes model building and a testing strategy. The testing strategies determine which attributes to perform the test on in order to minimize the sum of the classification costs and test costs. We show how to instantiate this framework in two popular machine learning algorithms: decision trees and naive Bayesian method. We empirically evaluate the test-cost-sensitive methods for handling missing values on several data sets.
Cost-sensitive learning, decision trees, naive Bayes.
Qiang Yang, Charles Ling, Xiaoyong Chai, Rong Pan, "Test-Cost Sensitive Classification on Data with Missing Values", IEEE Transactions on Knowledge & Data Engineering, vol.18, no. 5, pp. 626-638, May 2006, doi:10.1109/TKDE.2006.84