2012 Conference on Technologies and Applications of Artificial Intelligence (TAAI) (2012)
Nov. 16, 2012 to Nov. 18, 2012
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TAAI.2012.70
In active learning, raw samples are queried as few as possible to learn an accurate classifier. However, queried samples may encounter the problem of low diversity if they are selected without considering sample content. Then the classifier would be inefficiently resulted by the similar queried samples. In this paper, the approach, ALUC, is proposed to increase the diversity of queried uncertain samples. Raw samples are clustered based on the prior data distribution and sample uncertainty before they are queried. At first, the cluster seeds are found according to the underlying data distribution, without defining the number of clusters in advance. And the distance metric is designed to generate small clusters if they contain uncertain samples. Consequently representative samples of clusters are diverse in content and also informative to be queried. Through experimental results on a synthetic dataset and real-word datasets, it is shown that our distance metric for clustering is effective to find raw samples that are similar in content and uncertainty. And ALUC is able to query informative and diverse samples to result an accurate classifier.
learning (artificial intelligence), pattern classification, pattern clustering, query processing, uncertainty handling
J. Fu, S. Lee and W. Wu, "Efficient Active Learning Based on Uncertain Clusters," 2012 Conference on Technologies and Applications of Artificial Intelligence(TAAI), Tainan, Taiwan Taiwan, 2013, pp. 157-164.