This Article 
 Bibliographic References 
 Add to: 
Confidence-Based Active Learning
August 2006 (vol. 28 no. 8)
pp. 1251-1261
This paper proposes a new active learning approach, confidence-based active learning, for training a wide range of classifiers. This approach is based on identifying and annotating uncertain samples. The uncertainty value of each sample is measured by its conditional error. The approach takes advantage of current classifiers' probability preserving and ordering properties. It calibrates the output scores of classifiers to conditional error. Thus, it can estimate the uncertainty value for each input sample according to its output score from a classifier and select only samples with uncertainty value above a user-defined threshold. Even though we cannot guarantee the optimality of the proposed approach, we find it to provide good performance. Compared with existing methods, this approach is robust without additional computational effort. A new active learning method for support vector machines (SVMs) is implemented following this approach. A dynamic bin width allocation method is proposed to accurately estimate sample conditional error and this method adapts to the underlying probabilities. The effectiveness of the proposed approach is demonstrated using synthetic and real data sets and its performance is compared with the widely used least certain active learning method.

[1] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, second ed. John Wiley & Sons, 2000.
[2] D. Cohn, Z. Ghahramani, and M.I. Jordan, “Active Learning with Statistical Models,” J. Artificial Intelligence Research, vol. 4, pp. 129-145, 1996.
[3] G. Schohn and D. Cohn, “Less Is More: Active Learning with Support Vector Machines,” Proc. 17th Int'l Conf. Machine Learning, 2000.
[4] Y. Freund, H.S. Seung, E. Shamir, and N. Tishby, “Selective Sampling Using the Query by Committee Algorithm,” Machine Learning, vol. 28, pp. 133-168, 1997.
[5] C. Campbell, N. Cristianini, and A. Smola, “Query Learning with Large Margin Classifiers,” Proc. 17th Int'l Conf. Machine Learning, 2000.
[6] S. Tong and D. Koller, “Support Vector Machine Active Learning with Applications to Text Classification,” J. Machine Learning Research, vol. 2, pp. 45-66, 2001.
[7] D.D. Lewis and J. Catlett, “Heterogeneous Uncertainty Sampling for Supervised Learning,” Proc. 11th Int'l Conf. Machine Learning, pp. 148-156, 1994.
[8] P. Mitra, C.A. Murthy, and S.K. Pal, “A Probabilistic Active Support Vector Learning Algorithm,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 3, Mar. 2004.
[9] J.M. Park, “Convergence and Application of Online Active Sampling Using Orthogonal Pillar Vectors,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 9, Sept. 2004.
[10] K. Fukumizu, “Statistical Active Learning in Multilayer Perceptrons,” IEEE Trans. Neural Networks, vol. 11 , no. 1, pp. 17-26, Jan. 2000.
[11] M. Sassano, “An Empirical Study of Active Learning with Support Vector Machines for Japanese Word Segmentation,” Proc. 40th Ann. Meeting of the Assoc. Computational Linguistics (ACL), July 2002.
[12] V. Vapnik, Statistical Learning Theory. John Wiley & Sons, 1998.
[13] V. Vapnik, The Nature of Statistical Learning Theory, second ed. Springer, 1999.
[14] J. Platt, “Probabilistic Outputs for SVMs and Comparisons to Regularized Likelihood Methods,” Advances in Large Margin Classifiers, MIT Press, 1999.
[15] B. Zadrozny and C. Elkan, “Transforming Classifier Scores into Accurate Multiclass Probability Estimates,” Proc. Eighth Int'l Conf. Knowledge Discovery and Data Mining, 2002.
[16] K. Fukunaga, Introduction to Statistical Pattern Recognition, second ed. Academic Press, 1990.
[17] C.L. Blake and C.J. Merz, “UCI Repository of Machine Learning Databases,” Dept. of Information and Computer Science, Univ. of California, Irvine, mlearnMLRepository.html , 1998.
[18] T. Joachims, “Making Large-Scale SVM Learning Practical,” Advances in Kernel Method— Support Vector Learning, MIT-Press, 1999.
[19] C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121-167, 1998.
[20] A.K. Jain, R. Duin, and J. Mao, “Statistical Pattern Recognition: A Review,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.
[21] E.G. Kong and T.G. Dietterich, “Probability Estimation Using Error-Correcting Output Coding,” Proc. Int'l Conf.: Artificial Intelligence and Soft Computing, 1997.
[22] M.P. Wand, “Data-Based Choice of Histogram Bin Width,” Statistical Computing and Graphics, vol. 51, no. 1, pp. 59-64, Feb. 1997.
[23] M. Li and I.K. Sethi, “SVM-Based Classifier Design with Controlled Confidence,” Proc. 17th Int'l Conf. Pattern Recognition, 2004.
[24] I.K. Sethi, “Data Mining: An Introduction,” Data Mining for Design and Manufacturing, D. Braha, ed., pp. 1-40, Kluwer Academic, 2001.
[25] L.J. Bain and M. Engelhardt, Introduction to Probability and Mathematical Statistics, second ed. Duxbury Press, 1991.
[26] J.H. Friedman, “On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality,” Data Mining and Knowledge Discovery, vol. 1, no. 1, pp. 55-77, 1997.
[27] J.H. Friedman and N.I. Fisher, “Bump Hunting in High-Dimensional Data,” Statistics and Computing, vol. 9, no. 2, pp. 123-143, 1999.
[28] J.C. Platt, “Fast Training of Support Vector Machines Using Sequential Minimal Optimisation,” Advances in Kernel Methods— Support Vector Learning, B. Scholkopf, C.J.C. Burges, and A.J. Smola, eds., pp. 185-208, MIT Press, 1998.
[29] T.G. Dietterich, “Machine Learning for Sequential Data: A Review,” Lecture Notes in Computer Science, T. Caelli, ed., 2002.
[30] A. Blum, “On-Line Algorithms in Machine Learning,” Online Algorithms— the State of the Art, A. Fiat and G. Woeginger, eds., chapter 14, pp. 306-325, Springer, 1998.
[31] M.S. Bazaraa, H.D. Sherali, and C.M. Shetty, Nonlinear Programming: Theory and Algorithms, second ed. John Wiley & Sons, 1993.
[32] A.J. Izenman, “Recent Developments in Nonparametric Density Estimation,” J. Am. Statistical Assoc., vol. 86, no. 413, pp. 205-224, 1991.
[33] K. Brinker, “Incorporating Diversity in Active Learning with Support Vector Machines,” Proc. 20th Int'l Conf. Machine Learning, pp. 59-66, 2003.
[34] N. Roy and A. McCallum, “Toward Optimal Active Learning through Sampling Estimation of Error Reduction,” Proc. 18th Int'l Conf. Machine Learning, pp. 441-448, 2001.
[35] T. Luo, K. Kramer, D.B. Goldgof, L.O. Hall, S. Samson, A. Remsen, and T. Hopkins, “Active Learning to Recognize Multiple Types of Plankton,” J. Machine Learning Research, vol. 6, pp. 589-613, 2005.
[36] Y. Baram, R. El Yaniv, and K. Luz, “Online Choice of Active Learning Algorithms,” J. Machine Learning Research, vol. 5, pp. 255-291, 2004.
[37] A. Pal and S.K. Pal, “Generalized Guard-Zone Algorithm (GGA) for Learning: Automatic Selection of Threshold,” Pattern Recognition, vol. 23, no. 3/4, pp. 325-335, 1990.
[38] T. Mitchell, “Generalization as Search,” Artificial Intelligence, vol. 28, pp. 203-226, 1982.
[39] M. Li, “Confidence-Based Classifier Design and Its Applications,” PhD dissertation, Oakland Univ., 2005.
[40] M. Li and I.K. Sethi, “Confidence-Based Classifier Design,” Pattern Recognition, vol. 39, no.7, pp. 1230-1240, 2006.

Index Terms:
Active learning, error estimation, pattern classification, support vector machines.
Mingkun Li, Ishwar K. Sethi, "Confidence-Based Active Learning," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 8, pp. 1251-1261, Aug. 2006, doi:10.1109/TPAMI.2006.156
Usage of this product signifies your acceptance of the Terms of Use.