2012 IEEE 12th International Conference on Data Mining (2012)
Brussels, Belgium Belgium
Dec. 10, 2012 to Dec. 13, 2012
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2012.56
Practical machine learning and data mining problems often face shortage of labeled training data. Self-training algorithms are among the earliest attempts of using unlabeled data to enhance learning. Traditional self-training algorithms label unlabeled data on which classifiers trained on limited training data have the highest confidence. In this paper, a self-training algorithm that decreases the disagreement region of hypotheses is presented. The algorithm supplements the training set with self-labeled instances. Only instances that greatly reduce the disagreement region of hypotheses are labeled and added to the training set. Empirical results demonstrate that the proposed self-training algorithm can effectively improve classification performance.
self-training, semi-supervised learning
Y. Zhou, M. Kantarcioglu and B. Thuraisingham, "Self-Training with Selection-by-Rejection," 2012 IEEE 12th International Conference on Data Mining(ICDM), Brussels, Belgium Belgium, 2012, pp. 795-803.