This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers
November 2005 (vol. 17 no. 11)
pp. 1529-1541
In many practical data mining applications, such as Web page classification, unlabeled training examples are readily available, but labeled ones are fairly expensive to obtain. Therefore, semi-supervised learning algorithms such as co-training have attracted much attention. In this paper, a new co-training style semi-supervised learning algorithm, named tri-training, is proposed. This algorithm generates three classifiers from the original labeled example set. These classifiers are then refined using unlabeled examples in the tri-training process. In detail, in each round of tri-training, an unlabeled example is labeled for a classifier if the other two classifiers agree on the labeling, under certain conditions. Since tri-training neither requires the instance space to be described with sufficient and redundant views nor does it put any constraints on the supervised learning algorithm, its applicability is broader than that of previous co-training style algorithms. Experiments on UCI data sets and application to the Web page classification task indicate that tri-training can effectively exploit unlabeled data to enhance the learning performance.

[1] D. Angluin and P. Laird, “Learning from Noisy Examples,” Machine Learning, vol. 2, no. 4, pp. 343-370, 1988.
[2] K.P. Bennett, A. Demiriz, and R. Maclin, “Exploiting Unlabeled Data in Ensemble Methods,” Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 289-296, 2002.
[3] C. Blake, E. Keogh, and C.J. Merz, “UCI Repository of Machine Learning Databases” citeseer.nj.nec.com/482785.htmlhttp://www.ics.uci.edu/ ~mlearn MLRepository.html , Dept. of Information and Computer Science, Univ. of California, Irvine, 1998.
[4] A. Blum and S. Chawla, “Learning from Labeled and Unlabeled Data Using Graph Mincuts,” Proc. 18th Int'l Conf. Machine Learning, pp. 19-26, 2001.
[5] A. Blum and T. Mitchell, “Combining Labeled and Unlabeled Data with Co-Training,” Proc. 11th Ann. Conf. Computational Learning Theory, pp. 92-100, 1998.
[6] L. Breiman, “Bagging Predictors,” Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.
[7] D. Cohn, L. Atlas, and R. Ladner, “Improved Generalization with Active Learning,” Machine Learning, vol. 15, no. 2, pp. 201-221, 1994.
[8] M. Collins and Y. Singer, “Unsupervised Models for Named Entity Classifications,” Proc. Joint SIGDAT Conf. Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 100-110, 1999.
[9] F. D'Alché-Buc, Y. Grandvalet, and C. Ambroise, “Semi-Supervised Marginboost,” Advances in Neural Information Processing Systems 14, T.G. Dietterich, S. Becker, and Z. Ghahramani, eds., pp. 553-560, Cambridge, Mass.: MIT Press, 2002.
[10] S. Dasgupta, M. Littman, and D. McAllester, “PAC Generalization Bounds for Co-Training,” Advances in Neural Information Processing Systems 14, T.G. Dietterich, S. Becker, and Z. Ghahramani, eds., pp. 375-382, Cambridge, Mass.: MIT Press, 2002.
[11] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc., Series B, vol. 39, no. 1, pp. 1-38, 1977.
[12] T.G. Dietterich, “Ensemble Methods in Machine Learning,” Lecture Notes in Computer Science 1867, J. Kittler and F. Roli, eds., pp. 1-15, Berlin: Springer, 2000.
[13] B. Efron and R. Tibshirani, An Introduction to the Bootstrap. New York: Chapman & Hall, 1993.
[14] S. Goldman and Y. Zhou, “Enhancing Supervised Learning with Unlabeled Data,” Proc. 17th Int'l Conf. Machine Learning, pp. 327-334, 2000.
[15] R. Hwa, M. Osborne, A. Sarkar, and M. Steedman, “Corrected Co-Training for Statistical Parsers,” Working Notes of the ICML'03 Workshop Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, 2003.
[16] T. Joachims, “Transductive Inference for Text Classification Using Support Vector Machines,” Proc. 16th Int'l Conf. Machine Learning, pp. 200-209, 1999.
[17] D.J. Miller and H.S. Uyar, “A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data,” Advances in Neural Information Processing Systems 9, M. Mozer, M.I. Jordan, and T. Petsche, eds., pp. 571-577, Cambridge, Mass.: MIT Press, 1997.
[18] F. Muhlenbach, S. Lallich, and D.A. Zighed, “Identifying and Handling Mislabelled Instances,” J. Intelligent Information Systems, vol. 22, no. 1, pp. 89-109, 2004.
[19] K. Nigam and R. Ghani, “Analyzing the Effectiveness and Applicability of Co-Training,” Proc. Ninth ACM Int'l Conf. Information and Knowledge Management, pp. 86-93, 2000.
[20] K. Nigam, A.K. McCallum, S. Thrun, and T. Mitchell, “Text Classification from Labeled and Unlabeled Documents Using EM,” Machine Learning, vol. 39, nos. 2-3, pp. 103-134, 2000.
[21] D. Pierce and C. Cardie, “Limitations of Co-Training for Natural Language Learning from Large Data Sets,” Proc. 2001 Conf. Empirical Methods in Natural Language Processing, pp. 1-9, 2001.
[22] J.R. Quinlan, “MiniBoosting Decision Trees,” http://www.cse.unsw.edu.au/~quinlanminiboost.ps , 1998.
[23] E. Riloff and R. Jones, “Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping,” Proc. 16th Nat'l Conf. Artificial Intelligence, pp. 474-479, 1999.
[24] A. Sarkar, “Applying Co-Training Methods to Statistical Parsing,” Proc. Second Ann. Meeting of the North Am. Chapter of the Assoc. for Computational Linguistics, pp. 95-102, 2001.
[25] H. Seung, M. Opper, and H. Sompolinsky, “Query by Committee,” Proc. Fifth Ann. ACM Conf. Computational Learning Theory, pp. 287-294, 1992.
[26] B. Shahshahani and D. Landgrebe, “The Effect of Unlabeled Samples in Reducing the Small Sample Size Problem and Mitigating the Hughes Phenomenon,” IEEE Trans. Geoscience and Remote Sensing, vol. 32, no. 5, pp. 1087-1095, 1994.
[27] M. Steedman, M. Osborne, A. Sarkar, S. Clark, R. Hwa, J. Hockenmaier, P. Ruhlen, S. Baker, and J. Crim, “Bootstrapping Statistical Parsers from Small Data Sets,” Proc. 11th Conf. European Chapter of the Assoc. Computational Linguistics, pp. 331-338, 2003.
[28] D. Yarowsky, “Unsupervised Word Sense Disambiguation Rivaling Supervised Methods,” Proc. 33rd Ann. Meeting of the Assoc. Computational Linguistics, pp. 189-196, 1995.
[29] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. San Francisco: Morgan Kaufmann, 2000.

Index Terms:
Index Terms- Data mining, machine learning, learning from unlabeled data, semi-supervised learning, co-training, tri-training, Web page classification.
Citation:
Zhi-Hua Zhou, Ming Li, "Tri-Training: Exploiting Unlabeled Data Using Three Classifiers," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 11, pp. 1529-1541, Nov. 2005, doi:10.1109/TKDE.2005.186
Usage of this product signifies your acceptance of the Terms of Use.