This Article 
 Bibliographic References 
 Add to: 
Improving Classifier Performance Using Data with Different Taxonomies
November 2011 (vol. 23 no. 11)
pp. 1668-1677
Tomoharu Iwata, NTT Communication Science Laboratories, Kyoto
Toshiyuki Tanaka, Kyoto University, Kyoto
Takeshi Yamada, NTT, Kyoto
Naonori Ueda, NTT Communication Science Laboratories, Kyoto
We propose a framework for improving classifier performance by effectively using auxiliary samples. The auxiliary samples are labeled not in terms of the target taxonomy according to which we wish to classify samples, but according to classification schemes or taxonomies that are different from the target taxonomy. Our method finds a classifier by minimizing a weighted error over the target and auxiliary samples. The weights are defined so that the weighted error approximates the expected error when samples are classified into the target taxonomy. Experiments using synthetic and text data show that our method significantly improves the classifier performance in most cases compared to conventional data augmentation methods.

[1] K. Nigam, A.K. McCallum, S. Thrun, and T.M. Mitchell, "Text Classification from Labeled and Unlabeled Documents Using EM," Machine Learning, vol. 39, nos. 2/3, pp. 103-134, 2000.
[2] H. Daume III and D. Marcu, "Domain Adaptation for Statistical Classifiers," J. Artificial Intelligence Research, vol. 26, pp. 101-126, 2006.
[3] D.A. Cohn, Z. Ghahramani, and M.I. Jordan, "Active Learning with Statistical Models," J. Artificial Intelligence Research, vol. 4, pp. 129-145, 1995.
[4] A.K. McCallum, R. Rosenfeld, T.M. Mitchell, and A.Y. Ng, "Improving Text Classification by Shrinkage in a Hierarchy of Classes," Proc. ICML '98: 15th Int'l Conf. Machine Learning, pp. 359-367, 1998.
[5] A. Fujino, N. Ueda, and K. Saito, "Semisupervised Learning for a Hybrid Generative/Discriminative Classifier Based on the Maximum Entropy Principle," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 3, pp. 424-437, Mar. 2008.
[6] R. Raina, A.Y. Ng, and D. Koller, "Constructing Informative Priors Using Transfer Learning," Proc. ICML '06: Proc. 23rd Int'l Conf. Machine learning, pp. 713-720, 2006.
[7] R. Agrawal and R. Srikant, "On Integrating Catalogs," Proc. WWW '01: 10th Int'l Conf. World Wide Web, pp. 603-612, 2001.
[8] S. Sarawagi, S. Chakrabarti, and S. Godbole, "Cross-training: Learning Probabilistic Mappings between Topics," Proc. KDD '03: Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data mining, pp. 177-186, 2003.
[9] A. Doan, J. Madhavan, P. Domingos, and A. Halevy, "Learning to Map between Ontologies on the Semantic Web," Proc. WWW '02: 11th Int'l Conf. World Wide Web, pp. 662-673, 2002.
[10] R. Florian, A. Ittycheriah, H. Jing, and T. Zhang, "Named Entity Recognition through Classifier Combination," Proc. Seventh Conf. Natural Language Learning at North Am. Chapter of the Assoc. for Computational Linguistics—Human Language Technologies (HLT-NAACL), pp. 168-171, 2003.
[11] E. Gabrilovich and S. Markovitch, "Harnessing the Expertise of 70,000 Human Editors: Knowledge-Based Feature Generation for Text Categorization," J. Machine Learning Research, vol. 8, pp. 2297-2345, 2007.
[12] E. Gabrilovich and S. Markovitch, "Wikipedia-Based Semantic Interpretation for Natural Language Processing," J. Artificial Intelligence Research, vol. 34, pp. 443-498, 2009.
[13] G. Pandey, C.L. Myers, and V. Kumar, "Incorporating Functional Inter-Relationships into Protein Function Prediction Algorithms," BMC Bioinformatics, vol. 10, article no. 142, 2009.
[14] K. Nigam, J. Lafferty, and A. McCallum, "Using Maximum Entropy for Text Classification," Proc. Int'l Joint Conf. Artificial Intelligence (IJCAI '99) Workshop Machine Learning for Information Filtering, pp. 61-67, 1999.
[15] T. Iwata, K. Saito, and T. Yamada, "Recommendation Method for Extending Subscription Periods," Proc. KDD '06: 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 574-579, 2006.
[16] S.F. Chen and R. Rosenfeld, "A Gaussian Prior for Smoothing Maximum Entropy Models," Technical Report CMUCS-99-108, Computer Science Dept., Carnegie Mellon Univ., 1999.
[17] D.C. Liu and J. Nocedal, "On the Limited Memory BFGS Method for Large Scale Optimization," Math. Programming, vol. 45, no. 3, pp. 503-528, 1989.
[18] V. Vapnik, Statistical Learning Theory. Wiley, 1998.
[19] E.J. Bredensteiner and K.P. Bennett, "Multicategory Classification by Support Vector Machines," Computational Optimization and Applications, vol. 12, nos. 1-3, pp. 53-79, 1999.
[20] J. Weston and C. Watkins, "Support Vector Machines for Multi-Class Pattern Recognition," Proc. Seventh European Symp. Artificial Neural Networks, pp. 219-224, 1999.
[21] C.-F. Lin and S.-D. Wang, "Fuzzy Support Vector Machines," IEEE. Trans. Neural Networks, vol. 13, no. 2, pp. 464-471, Mar. 2002.
[22] X. Yang, Q. Song, and Y. Wang, "Weighted Support Vector Machine for Data Classification," Int'l J. Pattern Recognition and Artificial Intelligence, vol. 21, no. 5, pp. 961-976, 2007.
[23] K. Lang, "NewsWeeder: Learning to Filter Netnews," Proc. ICML '95: 12th Int'l Conf. Machine Learning, pp. 331-339, 1995.

Index Terms:
Transfer learning, semisupervised learning, text classification.
Tomoharu Iwata, Toshiyuki Tanaka, Takeshi Yamada, Naonori Ueda, "Improving Classifier Performance Using Data with Different Taxonomies," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 11, pp. 1668-1677, Nov. 2011, doi:10.1109/TKDE.2010.170
Usage of this product signifies your acceptance of the Terms of Use.