Issue No. 06 - June (2010 vol. 22)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.33
Jun Du , The University of Western Ontario, London
Charles X. Ling , The University of Western Ontario, London
With the assistance of a domain expert, active learning can often select or construct fewer examples to request their labels to build an accurate classifier. However, previous works of active learning can only generate and ask specific queries. In real-world applications, the domain experts (or oracles) are often more readily to answer “generalized queries” with don't-care attributes. The power of such generalized queries is that one generalized query is often equivalent to many specific ones. However, overly general queries are not good as answers from the domain experts (or oracles) can be highly uncertain, and this makes learning difficult. In this paper, we propose a novel active learning algorithm that asks good generalized queries. We, then, extend our algorithm to construct new, hierarchical features for both nominal and numeric attributes. We demonstrate experimentally that our new method asks significantly fewer queries compared with the previous works of active learning, even when the initial labeled data set is very small, and the oracle is inaccurate in class probability estimations. Our method can be readily deployed in real-world data mining tasks where obtaining labeled examples is costly.
Active learning, domain expert, generalized query.
C. X. Ling and J. Du, "Asking Generalized Queries to Domain Experts to Improve Learning," in IEEE Transactions on Knowledge & Data Engineering, vol. 22, no. , pp. 812-825, 2010.