This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Semisupervised Regression with Cotraining-Style Algorithms
November 2007 (vol. 19 no. 11)
pp. 1479-1493
The traditional setting of supervised learning requires a large amount of labeled training examples in order to achieve good generalization. However, in many practical applications, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. Therefore, semi-supervised learning has attracted much attention. Previous research on semi-supervised learning mainly focuses on semi-supervised classification. Although regression is almost as important as classification, semi-supervised regression is largely understudied. In particular, although co-training is a main paradigm in semi-supervised learning, few works has been devoted to co-training style semi-supervised regression algorithms. In this paper, a co-training style semi-supervised regression algorithm, i.e. Coreg, is proposed. This algorithm uses two regressors each labels the unlabeled data for the other regressor, where the confidence in labeling an unlabeled example is estimated through the amount of reduction in mean square error over the labeled neighborhood of that example. Analysis and experiments show that Coreg can effectively exploit unlabeled data to improve regression estimates.

[1] N. Abe and H. Mamitsuka, “Query Learning Strategies Using Boosting and Bagging,” Proc. 15th Int'l Conf. Machine Learning, pp.1-9, 1998.
[2] S. Abney, “Bootstrapping,” Proc. 40th Ann. Meeting of the Assoc. for Computational Linguistics, pp. 360-367, 2002.
[3] M.-F. Balcan, A. Blum, and K. Yang, “Co-Training and Expansion: Towards Bridging Theory and Practice,” Advances in Neural Information Processing Systems 17, L.K. Saul, Y. Weiss, and L.Bottou, eds., pp. 89-96, MIT Press, 2005.
[4] M. Belkin and P. Niyogi, “Semi-Supervised Learning on Riemannian Manifolds,” Machine Learning, vol. 56, nos. 1-3, pp. 209-239, 2004.
[5] M. Belkin, P. Niyogi, and V. Sindhwani, “On Manifold Regularization,” Proc. 10th Int'l Workshop Artificial Intelligence and Statistics, pp. 17-24, 2005.
[6] C. Blake, E. Keogh, and C.J. Merz,, UCI Repository of Machine Learning Databases, Dept. of Information and Computer Science, Univ. of Calif., Irvine, http://www.ics.uci.edu/~mlearnMLRepository.html , 1998.
[7] A. Blum and S. Chawla, “Learning from Labeled and Unlabeled Data Using Graph Mincuts,” Proc. 18th Int'l Conf. Machine Learning, pp. 19-26, 2001.
[8] A. Blum and T. Mitchell, “Combining Labeled and Unlabeled Data with Co-Training,” Proc. 11th Ann. Conf. Computational Learning Theory, pp. 92-100, 1998.
[9] U. Brefeld, T. Gärtner, T. Scheffer, and S. Wrobel, “Efficient Co-Regularised Least Squares Regression,” Proc. 23rd Int'l Conf. Machine Learning, pp. 137-144, 2006.
[10] L. Breiman, “Bagging Predictors,” Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.
[11] Semi-Supervised Learning, O. Chapelle, B. Schölkopf, and A.Zien,eds., MIT Press, 2006.
[12] M. Collins and Y. Singer, “Unsupervised Models for Named Entity Classifications,” Proc. Joint SIGDAT Conf. Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 100-110, 1999.
[13] F.G. Cozman and I. Cohen, “Unlabeled Data Can Degrade Classification Performance of Generative Classifiers,” Proc. 15th Int'l Conf. Florida Artificial Intelligence Research Soc., pp. 327-331, 2002.
[14] B.V. Dasarathy, Nearest Neighbor Norms: NN Pattern Classification Techniques. IEEE CS Press, 1991.
[15] S. Dasgupta, M. Littman, and D. McAllester, “PAC Generalization Bounds for Co-Training,” Advances in Neural Information Processing Systems 14, T.G. Dietterich, S. Becker, and Z. Ghahramani, eds., pp. 375-382, MIT Press, 2002.
[16] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc., Series B, vol. 39, no. 1, pp. 1-38, 1977.
[17] T.G. Dietterich, “Ensemble Methods in Machine Learning,” LNCS 1867, J. Kittler and F. Roli, eds., pp. 1-15, Springer, 2000.
[18] M.O. Franz, Y. Kwon, C.E. Rasmussen, and B. Schökopf, “Semi-Supervised Kernel Regression Using Whitened Function Classes,” LNCS 3175, C.E. Rasmussen, H.H. Bülthoff, B.Schölkopf, and M.A. Giese, eds., pp. 18-26, Springer, 2004.
[19] A. Fujino, N. Ueda, and K. Saito, “A Hybrid Generative/Discriminative Approach to Semi-Supervised Classifier Design,” Proc. 20th Nat'l Conf. Artificial Intelligence, pp. 764-769, 2005.
[20] S. Goldman and Y. Zhou, “Enhancing Supervised Learning with Unlabeled Data,” Proc. 17th Int'l Conf. Machine Learning, pp. 327-334, 2000.
[21] J.V. Hansen, “Combining Predictors: Meta Machine Learning Methods and Bias/Variance and Ambiguity Decompositions,” PhD dissertation, Dept. of Computer Science, Univ. of Aarhus, 2000.
[22] R. Hwa, M. Osborne, A. Sarkar, and M. Steedman, “Corrected Co-Training for Statistical Parsers,” Working Notes of the ICML '03 Workshop Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, 2003.
[23] T. Joachims, “Transductive Inference for Text Classification Using Support Vector Machines,” Proc. 16th Int'l Conf. Machine Learning, pp. 200-209, 1999.
[24] M. Li and Z.-H. Zhou, “Improve Computer-Aided Diagnosis with Machine Learning Techniques Using Undiagnosed Samples,” to be published in IEEE Trans. Systems, Man, and Cybernetics—Part A, vol. 38, 2008.
[25] R.P. Lippmann, “Pattern Classification Using Neural Networks,” IEEE Comm., vol. 27, no. 11, pp. 47-64, 1989.
[26] D.J. Miller and H.S. Uyar, “A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data,” Advances in Neural Information Processing Systems 9, M. Mozer, M. I. Jordan, and T. Petsche, eds., pp. 571-577, MIT Press, 1997.
[27] K. Nigam and R. Ghani, “Analyzing the Effectiveness and Applicability of Co-Training,” Proc. Ninth ACM Int'l Conf. Information and Knowledge Management, pp. 86-93, 2000.
[28] K. Nigam, A.K. McCallum, S. Thrun, and T. Mitchell, “Text Classification from Labeled and Unlabeled Documents Using EM,” Machine Learning, vol. 39, nos. 2-3, pp. 103-134, 2000.
[29] D. Pierce and C. Cardie, “Limitations of Co-Training for Natural Language Learning from Large Data Sets,” Proc. Conf. Empirical Methods in Natural Language Processing, pp. 1-9, 2001.
[30] A. Pozdnoukhov and S. Bengio, “Semi-Supervised Kernel Methods for Regression Estimation,” Proc. IEEE Int'l Conf. Acoustics, Speech and Signal Processing, vol. 5, pp. 577-580, 2006.
[31] G. Ridgeway, D. Madigan, and T. Richardson, “Boosting Methodology for Regression Problems,” Proc. Seventh Int'l Workshop Artificial Intelligence and Statistics, pp. 152-161, 1999.
[32] E. Riloff and R. Jones, “Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping,” Proc. 16th Nat'l Conf. Artificial Intelligence, pp. 474-479, 1999.
[33] A. Sarkar, “Applying Co-Training Methods to Statistical Parsing,” Proc. Second Ann. Meeting of the North Am. Chapter of the Assoc. for Computational Linguistics, pp. 95-102, 2001.
[34] H. Seung, M. Opper, and H. Sompolinsky, “Query by Committee,” Proc. Fifth ACM Workshop Computational Learning Theory, pp.287-294, 1992.
[35] B. Shahshahani and D. Landgrebe, “The Effect of Unlabeled Samples in Reducing the Small Sample Size Problem and Mitigating the Hughes Phenomenon,” IEEE Trans. Geoscience and Remote Sensing, vol. 32, no. 5, pp. 1087-1095, 1994.
[36] V. Sindhwani, P. Niyogi, and M. Belkin, “Beyond the Point Cloud: From Transductive to Semi-Supervised Learning,” Proc. 22nd Int'l Conf. Machine Learning, pp. 824-831, 2005.
[37] V. Sindhwani, P. Niyogi, and M. Belkin, “A Co-Regularized Approach to Semi-Supervised Learning with Multiple Views,” Working Notes of the ICML '05 Workshop Learning with Multiple Views, 2005.
[38] M. Steedman, M. Osborne, A. Sarkar, S. Clark, R. Hwa, J. Hockenmaier, P. Ruhlen, S. Baker, and J. Crim, “Bootstrapping Statistical Parsers from Small Data Sets,” Proc. 11th Conf. European Chapter of the Assoc. for Computational Linguistics, pp. 331-338, 2003.
[39] Q. Tian, J. Yu, Q. Xue, and N. Sebe, “A New Analysis of the Value of Unlabeled Data in Semi-Supervised Learning for Image Retrieval,” Proc. IEEE Int'l Conf. Multimedia Expo, pp. 1019-1022, 2004.
[40] V.N. Vapnik, Statistical Learning Theory. John Wiley & Sons, 1998.
[41] P. Vlachos, StatLib Project Repository, Dept. of Statistics, Carnegie Mellon Univ., http:/lib.stat.cmu.edu, 2000.
[42] M. Wang, X.-S. Hua, Y. Song, L.-R. Dai, and H.-J. Zhang, “Semi-Supervised Kernel Regression,” Proc. Sixth IEEE Int'l Conf. Data Mining, pp. 1130-1135, 2006.
[43] W. Wang and Z.-H. Zhou, “Analyzing Co-Training Style Algorithms,” Proc. 18th European Conf. Machine Learning, 2007.
[44] J.A.E. Weston, M.O. Stitson, A. Gammerman, V. Vovk, and V. Vapnik, “Experiments with Support Vector Machines,” Technical Report CSD-TR-96-19, Royal Holloway Univ. of London, London, 1996.
[45] D. Yarowsky, “Unsupervised Word Sense Disambiguation Rivaling Supervised Methods,” Proc. 33rd Ann. Meeting of the Assoc. for Computational Linguistics, pp. 189-196, 1995.
[46] D. Zhou, B. Schölkopf, and T. Hofmann, “Semi-Supervised Learning on Directed Graphs,” Advances in Neural Information Processing Systems 17, L.K. Saul, Y. Weiss, and L. Bottou, eds., pp.1633-1640, MIT Press, 2005.
[47] Y. Zhou and S. Goldman, “Democratic Co-Learning,” Proc. 16th IEEE Int'l Conf. Tools with Artificial Intelligence, pp. 594-602, 2004.
[48] Z.-H. Zhou, K.-J. Chen, and H.-B. Dai, “Enhancing Relevance Feedback in Image Retrieval Using Unlabeled Data,” ACM Trans. Information Systems, vol. 24, no. 2, pp. 219-244, 2006.
[49] Z.-H. Zhou, K.-J. Chen, and Y. Jiang, “Exploiting Unlabeled Data in Content-Based Image Retrieval,” Proc. 15th European Conf. Machine Learning, pp. 525-536, 2004.
[50] Z.-H. Zhou and M. Li, “Semi-Supervised Learning with Co-Training,” Proc. 19th Int'l Joint Conf. Artificial Intelligence, pp. 908-913, 2005.
[51] Z.-H. Zhou and M. Li, “Tri-Training: Exploiting Unlabeled Data Using Three Classifiers,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 11, pp. 1529-1541, Nov. 2005.
[52] Z.-H. Zhou, J. Wu, and W. Tang, “Ensembling Neural Networks: Many Could Be Better than All,” Artificial Intelligence, vol. 137, nos.1-2, pp. 239-263, 2002.
[53] Z.-H. Zhou, D.-C. Zhan, and Q. Yang, “Semi-Supervised Learning with Very Few Labeled Training Examples,” Proc. 22nd AAAI Conf. Artificial Intelligence, pp. 675-680, 2007.
[54] X. Zhu, “Semi-Supervised Learning Literature Survey,” Technical Report 1530, Dept. of Computer Sciences, Univ. of Wisconsin, Madison, http://www.cs.wisc.edu/~jerryzhu/pubssl_survey.pdf , 2006.
[55] X. Zhu, Z. Ghahramani, and J. Lafferty, “Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions,” Proc. 20th Int'l Conf. Machine Learning, pp. 912-919, 2003.

Index Terms:
Data mining, Machine learning
Citation:
Zhi-Hua Zhou, Ming Li, "Semisupervised Regression with Cotraining-Style Algorithms," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 11, pp. 1479-1493, Nov. 2007, doi:10.1109/TKDE.2007.190644
Usage of this product signifies your acceptance of the Terms of Use.