This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Semisupervised Learning for a Hybrid Generative/Discriminative Classifier based on the Maximum Entropy Principle
March 2008 (vol. 30 no. 3)
pp. 424-437
This paper presents a method for designing semi-supervised classifiers trained on labeled and unlabeled samples. We focus on probabilistic semi-supervised classifier design for multi-class and singlelabeled classification problems, and propose a hybrid approach that takes advantage of generative and discriminative approaches. In our approach, we first consider a generative model trained by using labeled samples and introduce a bias correction model, where these models belong to the same model family, but have different parameters. Then, we construct a hybrid classifier by combining these models based on the maximum entropy principle. To enable us to apply our hybrid approach to text classification problems, we employed naive Bayes models as the generative and bias correction models. Our experimental results for four text data sets confirmed that the generalization ability of our hybrid classifier was much improved by using a large number of unlabeled samples for training when there were too few labeled samples to obtain good performance. We also confirmed that our hybrid approach significantly outperformed generative and discriminative approaches when the performance of the generative and discriminative approaches was comparable. Moreover, we examined the performance of our hybrid classifier when the labeled and unlabeled data distributions were different.

[1] K. Nigam, A. McCallum, S. Thrun, and T. Mitchell, “Text Classification from Labeled and Unlabeled Documents Using EM,” Machine Learning, vol. 39, pp. 103-134, 2000.
[2] Y. Grandvalet and Y. Bengio, “Semi-Supervised Learning by Entropy Minimization,” Advances in Neural Information Processing Systems 17, MIT Press, pp. 529-536, 2005.
[3] M. Szummer and T. Jaakkola, “Kernel Expansions with Unlabeled Examples,” Advances in Neural Information Processing Systems 13, MIT Press, pp. 626-632, 2001.
[4] M. Inoue and N. Ueda, “Exploitation of Unlabeled Sequences in Hidden Markov Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 12, pp. 1570-1581, Dec. 2003.
[5] M.R. Amini and P. Gallinari, “Semi-Supervised Logistic Regression,” Proc. 15th European Conf. Artificial Intelligence, pp. 390-394, 2002.
[6] T. Joachims, “Transductive Inference for Text Classification Using Support Vector Machines,” Proc. 16th Int'l Conf. Machine Learning, pp. 200-209, 1999.
[7] A. Blum and T. Mitchell, “Combining Labeled and Unlabeled Data with Co-Training,” Proc. 11th Ann. Conf. Computational Learning Theory, vol. 11, 1998.
[8] X. Zhu, Z. Ghahramani, and J. Lafferty, “Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions,” Proc. 20th Int'l Conf. Machine Learning, pp. 912-919, 2003.
[9] M. Seeger, “Learning with Labeled and Unlabeled Data,” technical report, Univ. of Edinburgh, 2001.
[10] A.Y. Ng and M.I. Jordan, “On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes,” Advances in Neural Information Processing Systems 14, pp.841-848, MIT Press, 2002.
[11] S. Tong and D. Koller, “Restricted Bayes Optimal Classifiers,” Proc. 17th Nat'l Conf. Artificial Intelligence, pp. 658-664, 2000.
[12] R. Raina, Y. Shen, A.Y. Ng, and A. McCallum, “Classification with Hybrid Generative/Discriminative Models,” Advances in Neural Information Processing Systems 16, MIT Press, 2004.
[13] A.L. Berger, S.A. Della Pietra, and V.J. Della Pietra, “A Maximum Entropy Approach to Natural Language Processing,” Computational Linguistics, vol. 22, no. 1, pp. 39-71, 1996.
[14] A. Fujino, N. Ueda, and K. Saito, “A Hybrid Generative/Discriminative Approach to Semi-Supervised Classifier Design,” Proc. 20th Nat'l Conf. Artificial Intelligence, pp. 764-769, 2005.
[15] A. Fujino, N. Ueda, and K. Saito, “Semi-Supervised Learning on Hybrid Generative/Discriminative Models,” Information Technology Letters, vol. 4, pp. 161-164, 2005, in Japanese.
[16] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc. B, vol. 39, pp. 1-38, 1977.
[17] F.G. Cozman and I. Cohen, “Unlabeled Data Can Degrade Classification Performance of Generative Classifiers,” Proc. 15th Int'l Florida Artificial Intelligence Research Soc. Conf., pp. 327-331, 2002.
[18] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, 2001.
[19] K. Nigam, J. Lafferty, and A. McCallum, “Using Maximum Entropy for Text Classification,” Proc. Int'l Joint Conf. Artificial Intelligence Workshop Machine Learning for Information Filtering, pp.61-67, 1999.
[20] S.F. Chen and R. Rosenfeld, “A Gaussian Prior for Smoothing Maximum Entropy Models,” technical report, Carnegie Mellon Univ., 1999.
[21] D.C. Liu and J. Nocedal, “On the Limited Memory BFGS Method for Large Scale Optimization,” Math. Programming B, vol. 45, no. 3, pp. 503-528, 1989.
[22] A. Fujino, N. Ueda, and K. Saito, “A Hybrid Generative/Discriminative Approach to Text Classification with Additional Information,” Information Processing and Management, vol. 43, pp.379-392, 2007.
[23] Y. Yang and X. Liu, “A Re-Examination of Text Categorization Methods,” Proc. 22nd ACM Int'l Conf. Research and Development in Information Retrieval, pp. 42-49, 1999.
[24] G. Salton and M.J. McGill, Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
[25] G. Forman, “An Extensive Empirical Study of Feature Selection Metrics for Text Classification,” J. Machine Learning Research, vol. 3, pp. 1289-1305, 2003.
[26] R. Bekkerman, R. El-Yaniv, N. Tishby, and Y. Winter, “On Feature Distributional Clustering for Text Classification,” Proc. 24th ACM Int'l Conf. Research and Development in Information Retrieval, pp.146-153, 2001.
[27] J. Demšar, “Statistical Comparisons of Classifiers over Multiple Data Sets,” J. Machine Learning Research, vol. 7, pp. 1-30, 2006.
[28] D.J. Miller and H.S. Uyar, “A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data,” Advances in Neural Information Processing Systems 9, pp. 571-577, MIT Press, 1997.
[29] N.V. Chawla and G. Karakoulas, “Learning from Labeled and Unlabeled Data: An Empirical Study across Techniques and Domains,” J. Artificial Intelligence Research, vol. 23, pp. 331-366, 2005.
[30] I.S. Dhillon and D.S. Modha, “Concept Decompositions for Large Sparse Text Data Using Clustering,” Machine Learning, vol. 42, pp.143-175, 2001.
[31] C.D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing. The MIT Press, 1999.
[32] F. Jelinek and R. Mercer, “Interpolated Estimation of Markov Source Parameters from Sparse Data,” Pattern Recognition in Practice, E.S. Gelsema and L.N. Kanal, eds., pp. 381-402, North Holland Publishing, 1980.

Index Terms:
generative model, maximum entropy principle, bias correction, unlabeled samples, text classification
Citation:
Akinori Fujino, Naonori Ueda, Kazumi Saito, "Semisupervised Learning for a Hybrid Generative/Discriminative Classifier based on the Maximum Entropy Principle," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 3, pp. 424-437, March 2008, doi:10.1109/TPAMI.2007.70710
Usage of this product signifies your acceptance of the Terms of Use.