CSDL Home IEEE Transactions on Pattern Analysis & Machine Intelligence 2011 vol.33 Issue No.01 - January

Subscribe

Issue No.01 - January (2011 vol.33)

pp: 129-143

Ke Chen , The University of Manchester, Manchester

Shihai Wang , The University of Manchester, Manchester

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPAMI.2010.92

ABSTRACT

Semi-supervised learning concerns the problem of learning in the presence of labeled and unlabeled data. Several boosting algorithms have been extended to semi-supervised learning with various strategies. To our knowledge, however, none of them takes all three semi-supervised assumptions, i.e., smoothness, cluster, and manifold assumptions, together into account during boosting learning. In this paper, we propose a novel cost functional consisting of the margin cost on labeled data and the regularization penalty on unlabeled data based on three fundamental semi-supervised assumptions. Thus, minimizing our proposed cost functional with a greedy yet stagewise functional optimization procedure leads to a generic boosting framework for semi-supervised learning. Extensive experiments demonstrate that our algorithm yields favorite results for benchmark and real-world classification tasks in comparison to state-of-the-art semi-supervised learning algorithms, including newly developed boosting algorithms. Finally, we discuss relevant issues and relate our algorithm to the previous work.

INDEX TERMS

Semi-supervised learning, boosting framework, smoothness assumption, cluster assumption, manifold assumption, regularization.

CITATION

Ke Chen, Shihai Wang, "Semi-Supervised Learning via Regularized Boosting Working on Multiple Semi-Supervised Assumptions",

*IEEE Transactions on Pattern Analysis & Machine Intelligence*, vol.33, no. 1, pp. 129-143, January 2011, doi:10.1109/TPAMI.2010.92REFERENCES

- [1] M. Belkin, P. Niyogi, and V. Sindhwani, "Manifold Regularization: A Geometric Framework for Learning from Examples,"
J. Machine Learning Research, vol. 7, pp. 2399-2434, 2006.- [2] K. Bennett, A. Demiriz, and R. Maclin, "Expoliting Unlabeled Data in Ensemble Methods,"
Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining, pp. 289-296, 2002.- [3] A. Blum and S. Chawla, "Combining Labeled and Unlabeled Data Using Graph Mincuts,"
Proc. 10th Ann. Conf. Computational Learning Theory, pp. 92-100, 1998.- [4] A. Blum and S. Chawla, "Learning from Labeled and Unlabeled Data with Co-Training,"
Proc. Int'l Conf. Machine Learning, pp. 19-26, 2001.- [5] O. Bousquet, O. Chapelle, and M. Hein, "Measure Based Regularization,"
Advances in Neural Information Processing Systems, vol. 16, MIT Press, 2004.- [6] Y. Bengio, O.B. Alleau, and N. Le Roux, "Label Propagation and Quadratic Criterion,"
Semi-Supervised Learning, pp. 193-207, MIT Press, 2006.- [7] O. Chapelle, J. Weston, and B. Schölkopf, "Cluster Kernels for Semi-Supervised Learning,"
Advances in Neural Information Processing Systems, vol. 15, MIT Press, 2003.- [8] O. Chapelle and A. Zien, "Semi-Supervised Classification by Low Density Separation,"
Proc. 10th Int'l Workshop Artificial Intelligence and Statistics, pp. 57-64, 2005.- [9] O. Chapelle, B. Schölkopf, and A. Zien,
Semi-Supervised Learning. MIT Press 2006.- [10] O. Chapelle, V. Sindhwani, and S. Keerthi, "Optimization Techniques for Semi-Supervised Support Vector Machines,"
J. Machine Learning Research, vol. 9, pp. 203-223, 2008.- [11] N.V. Chawla and G. Karakoulas, "Learning from Labeled and Unlabeled Data: An Empirical Study across Techniques and Domains,"
J. Artificial Intelligence Research, vol. 23, pp. 331-366, 2005.- [12] K. Chen and S. Wang, "Regularized Boost for Semi-Supervised Learning,"
Advances in Neural Information Processing Systems, vol. 20, MIT Press, 2007.- [13] M. Collins and Y. Singer, "Unsupervised Models for the Named Entity Classification,"
Proc. SIGDAT Conf. Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 100-110, 1999.- [14] F. d'Alché-Buc, Y. Grandvalet, and C. Ambroise, "Semi-Supervised MarginBoost,"
Advances in Neural Information Processing Systems, vol. 14, MIT Press, 2002.- [15] R. Duda, P. Hart, and D. Stork,
Pattern Classification, second ed. Wiley-Interscience, 2001.- [16] Y. Freund and R.E. Schapire, "Experiments with a New Boosting Algorithm,"
Proc. Int'l Conf. Machine Learning, pp. 148-156, 1996.- [17] Y. Grandvalet and Y. Begio, "Semi-Supervised Learning by Entropy Minimization,"
Advances in Neural Information Processing Systems, vol. 17, MIT Press, 2005.- [18] G. Haffari, "A Survey on Inductive Semi-Supervised Learning," technical report, Dept. of Computer Science, Simon Fraser Univ., 2006.
- [19] T. Hertz, A. Bar-Hillel, and D. Weinshall, "Boosting Margin Based Distance Functions for Clustering,"
Proc. Int'l Conf. Machine Learning, 2004.- [20] T. Joachims, "Transductive Inference for Text Classification Using Support Vector Machines,"
Proc. Int'l Conf. Machine Learning, pp. 200-209, 1999.- [21] T. Joachims, "Transductive Learning via Spectral Graph Partitioning,"
Proc. Int'l Conf. Machine Learning, pp. 290-297, 2003.- [22] B. Kégl and L. Wang, "Boosting on Manifolds: Adaptive Regularization of Base Classifier,"
Advances in Neural Information Processing Systems, vol. 16, MIT Press, 2005.- [23] B. Leskes, "The Value of Agreement, a New Boosting Algorithm,"
Proc. Int'l Conf. Computational Learning Theory, pp. 95-110, 2005.- [24] N. Loeff, D. Forsyth, and D. Ramachandran, "ManifoldBoost: Stagewise Function Approximation for Fully-, Semi- and Un-Supervised Learning,"
Proc. Int'l Conf. Machine Learning, pp. 600-607, 2008.- [25] P. Mallapragada, R. Jin, A. Jain, and Y. Liu, "SemiBoost: Boosting for Semi-Supervised Learning,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 11, pp. 2000-2014, Nov. 2009.- [26] A. Martinez and R. Benavente, "The AR Face Database," CVC Technical Report 24, Purdue Univ., 1998.
- [27] L. Mason, P. Bartlett, J. Baxter, and M. Frean, "Functional Gradient Techniques for Combining Hypotheses,"
Advances in Large Margin Classifiers, MIT Press, 2000.- [28] K. Nigam, A. McCallum, S. Thrum, and T. Mitchell, "Using EM to Classify Text from Labeled and Unlabeled Documents,"
Machine Learning, vol. 39, pp. 103-134, 2000.- [29] A. Saffari, H. Grabner, and H. Bischof, "SERBoost: Semi-Supervised Boosting with Expectation Regularization,"
Proc. European Conf. Computer Vision, pp. III:588-601, 2008.- [30] A. Saffari, C. Leistner, and H. Bischof, "Regularized Multi-Class Semi-Supervised Boosting,"
Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2009.- [31] M. Seeger, "Learning with Labeled and Unlabeled Data," technical report, School of Informatics, The Univ. of Edinburgh, 2000.
- [32] P. Silapachote, D. Karuppiah, and A.R. Hanson, "Feature Selection Using Adaboost for Face Expression Recognition,"
Proc. IASTED Int'l Conf. Visualization, Image, and Image Processing, 2004.- [33] M. Szummer and T. Jaakkola, "Partially Labeled Classification with Markov Random Walks,"
Advances in Neural Information Processing Systems, vol. 15, MIT Press, 2001.- [34] M. Szummer and T. Jaakkola, "Information Regularization with Partially Labeled Data,"
Advances in Neural Information Processing Systems, vol. 15, MIT Press, 2003.- [35]
UCI Machine Learning Repository, http://www.ics.uci.edu/mlearnMLRepository.html , 2007.- [36] H. Valizadegan, R. Jin, and A. Jain, "Semi-Supervised Boosting for Multi-Class Classification,"
Proc. European Conf. Machine Learning and Knowledge Discovery in Databases, pp. 588-601, 2008.- [37] V.N. Vapnik,
Statistical Learning Theory. Wiley, 1998.- [38] A.M. Yip, C. Ding, and T.F. Chan, "Dynamic Cluster Formation Using Level Set Methods,"
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 6, pp. 877-889, June 2006.- [39] D. Zhou, O. Bousquet, T.N. Lal, J. Weston, and B. Scholkopf, "Learning with Local and Global Consistency,"
Advances in Neural Information Processing Systems, vol. 16, MIT Press, 2004.- [40] X. Zhu, "Semi-Supervised Learning Literature Survey," Technical Report TR-1530, Dept. of Computer Science, Univ. of Wisconsin, 2005.
- [41] X. Zhu, Z. Ghahramani, and J. Lafferty, "Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions,"
Proc. Int'l Conf. Machine Learning, pp. 912-919, 2003.- [42] X. Zhu and J. Lafferty, "Harmonic Mixtures: Combining Mixture Models and Graph-Based Methods for Inductive and Scalable Semi-Supervised Learning,"
Proc. Int'l Conf. Machine Learning, pp. 1052-1059, 2005.- [43] H. Zou, J. Zhu, and T. Hastie, "New Multicategory Boosting Algorithms Based on Multicategory Fisher-Consistent Losses,"
Annals of Applied Statistics, vol. 2, pp. 1290-1306, 2008. |