The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - June (2013 vol.35)
pp: 1284-1297
Bo Chen , The Chinese University of Hong Kong, Hong Kong
Wai Lam , The Chinese University of Hong Kong, Hong Kong
Ivor W. Tsang , Nanyang Technological University, Singapore
Tak-Lam Wong , The Hong Kong Institute of Education, Hong Kong
ABSTRACT
We propose a framework for adapting text mining models that discovers low-rank shared concept space. Our major characteristic of this concept space is that it explicitly minimizes the distribution gap between the source domain with sufficient labeled data and the target domain with only unlabeled data, while at the same time it minimizes the empirical loss on the labeled data in the source domain. Our method is capable of conducting the domain adaptation task both in the original feature space as well as in the transformed Reproducing Kernel Hilbert Space (RKHS) using kernel tricks. Theoretical analysis guarantees that the error of our adaptation model can be bounded with respect to the embedded distribution gap and the empirical loss in the source domain. We have conducted extensive experiments on two common text mining problems, namely, document classification and information extraction, to demonstrate the efficacy of our proposed framework.
INDEX TERMS
Kernel, Adaptation models, Training, Optimization, Text mining, Testing, Industries,text mining, Domain adaptation, low-rank concept extraction
CITATION
Bo Chen, Wai Lam, Ivor W. Tsang, Tak-Lam Wong, "Discovering Low-Rank Shared Concept Space for Adapting Text Mining Models", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 6, pp. 1284-1297, June 2013, doi:10.1109/TPAMI.2012.243
REFERENCES
[1] J. Huang, A. Smola, A. Gretton, K.M. Borgwardt, and B. Schölkopf, "Correcting Sample Selection Bias by Unlabeled Data," Advances in Neural Information Processing Systems 19, pp. 601-608, 2007.
[2] A. Storkey and M. Sugiyama, "Mixture Regression for Covariate Shift," Advances in Neural Information Processing Systems 19, pp. 1337-1344, 2007.
[3] M. Sugiyama, S. Nakajima, H. Kashima, P. von Bunau, and M. Kawanabe, "Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation," Advances in Neural Information Processing Systems 20, 2008.
[4] A. Gretton, A. Smola, J. Huang, M. Schmittfull, K. Borgwardt, and B. Borgwardt, "Covariate Shift by Kernel Mean Matching," Data Set Shift in Machine Learning, J. Quionero-Candela, M. Sugiyama, A. Schwaighofer, and N. Lawrence, eds., pp. 131-160, MIT Press, 2009.
[5] S. Bickel, M. Brückner, and T. Scheffer, "Discriminative Learning under Covariate Shift," J. Machine Learning Research, vol. 10, pp. 2137-2155, 2009.
[6] J. Blitzer, R. McDonald, and F. Pereira, "Domain Adaptation with Structural Correspondence Learning," Proc. Conf. Empirical Methods in Natural Language Processing, pp. 120-128, 2006.
[7] S.J. Pan, J.T. Kwok, and Q. Yang, "Transfer Learning via Dimensionality Reduction," Proc. 23rd AAAI Conf. Artificial Intelligence, pp. 677-682, 2008.
[8] S.J. Pan, I.W. Tsang, J.T. Kwok, and Q. Yang, "Domain Adaptation via Transfer Component Analysis," IEEE Trans. Neural Networks, vol. 22, no. 2, pp. 199-210, Feb. 2011.
[9] B. Chen, W. Lam, I. Tsang, and T.-L. Wong, "Extracting Discriminative Concepts for Domain Adaptation in Text Mining," Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 179-188, 2009.
[10] A. Arnold, R. Nallapati, and W.W. Cohen, "A Comparative Study of Methods for Transductive Transfer Learning," Proc. Seventh IEEE Int'l Conf. Data Mining Workshops, pp. 77-82, 2007.
[11] R.K. Ando and T. Zhang, "A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data," J. Machine Learning Research, vol. 6, pp. 1817-1853, 2005.
[12] S. Ji, L. Tang, S. Yu, and J. Ye, "Extracting Shared Subspace for Multi-Label Classification," Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 381-389, 2008.
[13] B. Schölkopf, A.J. Smola, and K.-R. Müller, "Nonlinear Component Analysis as a Kernel Eigenvalue Problem," Neural Computation, vol. 10, no. 5, pp. 1299-1319, 1998.
[14] A. Gretton, K. Borgwardt, M. Rasch, B. Schölkolpf, and A. Smola, "A Kernel Method for the Two-Sample Problem," Advances in Neural Information Processing Systems 19, pp. 513-520, 2007.
[15] K.M. Borgwardt, A. Gretton, M.J. Rasch, H.-P. Kriegel, B. Schölkopf, and A.J. Smola, "Integrating Structured Biological Data by Kernel Maximum Mean Discrepancy," Bioinformatics, vol. 22, pp. 49-57, 2006.
[16] K.M. Borgwardt, A. Gretton, M.J. Rasch, H.-P. Kriegel, B. Schölkopf, and A.J. Smola, "Integrating Structured Biological Data by Kernel Maximum Mean Discrepancy," Proc. 14th Ann. Int'l Conf. Intelligence Systems for Molecular Biology, pp. 49-57, 2006.
[17] T. Zhang, "Solving Large Scale Linear Prediction Problems Using Stochastic Gradient Descent Algorithms," Proc. 21st Int'l Conf. Machine Learning, 2004.
[18] T. Joachims, Learning to Classify Text Using Support Vector Machines —Methods, Theory, and Algorithms. Kluwer/Springer, 2002.
[19] T. Joachims, "Transductive Inference for Text Classification Using Support Vector Machines," Proc. 16th Int'l Conf. Machine Learning, pp. 200-209, 1999.
[20] X. Ling, W. Dai, G.-R. Xue, Q. Yang, and Y. Yu, "Spectral Domain-Transfer Learning," Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 488-496, 2008.
[21] J. Jiang and C. Zhai, "A Two-Stage Approach to Domain Adaptation for Statistical Classifiers," Proc. 16th ACM Conf. Information and Knowledge Management, pp. 401-410, 2007.
[22] J. Demšar, "Statistical Comparisons of Classifiers over Multiple Data Sets," J. Machine Learning Research, vol. 7, pp. 1-30, 2006.
[23] T.-L. Wong and W. Lam, "Learning to Adapt Web Information Extraction Knowledge and Discovering New Attributes via a Bayesian Approach," IEEE Trans. Knowledge and Data Eng., vol. 22, no. 4, pp. 523-536, Apr. 2010.
[24] S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, "Analysis of Representations for Domain Adaptation," Advances in Neural Information Processing Systems 19, pp. 137-144, 2006.
[25] J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Wortman, "Learning Bounds for Domain Adaptation," Advances in Neural Information Processing Systems 20, 2007.
[26] J. Blitzer, M. Dredze, and F. Pereira, "Biographies, Bollywood, Boom-Boxes and Blenders: Domain Adaptation for Sentiment Classification," Proc. 45th Ann. Meeting Assoc. for Computational Linguistics, 2007.
[27] R. Raina, A. Battle, H. Lee, B. Packer, and A.Y. Ng, "Self-Taught Learning: Transfer Learning from Unlabeled Data," Proc. 24th Int'l Conf. Machine Learning, pp. 759-766, 2007.
[28] W. Dai, G.-R. Xue, Q. Yang, and Y. Yu, "Co-Clustering Based Classification for Out-of-Domain Documents," Proc. 13th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 210-219, 2007.
[29] G.-R. Xue, W. Dai, Q. Yang, and Y. Yu, "Topic-Bridged PLSA for Cross-Domain Text Classification," Proc. 31st Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 627-634, 2008.
[30] T. Hofmann, "Unsupervised Learning by Probabilistic Latent Semantic Analysis," Machine Learning, vol. 42, nos. 1/2, pp. 177-196, 2001.
[31] J. Yang, R. Yan, and A.G. Hauptmann, "Cross-Domain Video Concept Detection Using Adaptive Svms," Proc. 15th ACM Int'l Conf. Multimedia, 2007.
[32] B. Zadrozny, "Learning and Evaluating Classifiers under Sample Selection Bias," Proc. 21st Int'l Conf. Machine Learning, 2004.
[33] W. Fan, I. Davidson, B. Zadrozny, and P.S. Yu, "An Improved Categorization of Classifier's Sensitivity on Sample Selection Bias," Proc. IEEE Fifth Int'l Conf. Data Mining, pp. 605-608, 2005.
[34] J. Jiang and C. Zhai, "Instance Weighting for Domain Adaptation in NLP," Proc. 45th Ann. Meeting Assoc. for Computational Linguistics, pp. 264-271, 2007.
[35] S. Bickel, C. Sawade, and T. Scheffer, "Transfer Learning by Distribution Matching for Targeted Advertising," Advances in Neural Information Processing Systems 20, pp. 145-152, 2008.
[36] W. Dai, G.-R. Xue, Q. Yang, and Y. Yu, "Transferring Naive Bayes Classifiers for Text Classification," Proc. 22nd AAAI Conf. Artificial Intelligence, pp. 540-545, 2007.
[37] E. Zhong, W. Fan, J. Peng, K. Zhang, J. Ren, D.S. Turaga, and O. Verscheure, "Cross Domain Distribution Adaptation via Kernel Mapping," Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 1027-1036, 2009.
[38] H. DauméIII, "Frustratingly Easy Domain Adaptation," Proc. 45th Ann. Meeting Assoc. for Computational Linguistics, pp. 256-263, June 2007.
[39] K.Q. Weinberger, A. Dasgupta, J. Langford, A.J. Smola, and J. Attenberg, "Feature Hashing for Large Scale Multitask Learning," Proc. 26th Int'l Conf. Machine Learning, pp. 1113-1120, 2009.
[40] J.R. Finkel and C.D. Manning, "Hierarchical Bayesian Domain Adaptation," Proc. Human Language Technologies: Ann. Conf. North Am. Chapter Assoc. for Computational Linguistics, pp. 602-610, 2009.
[41] Y. Xue, X. Liao, L. Carin, and B. Krishnapuram, "Multi-Task Learning for Classification with Dirichlet Process Priors," J. Machine Learning Research, vol. 8, pp. 35-63, 2007.
[42] T. Evgeniou and M. Pontil, "Regularized Multi-Task Learning," Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 109-117, 2004.
[43] K. Yu, V. Tresp, and A. Schwaighofer, "Learning Gaussian Processes from Multiple Tasks," Proc. 22nd Int'l Conf. Machine Learning, pp. 1012-1019, 2005.
[44] H. DauméIII, "Bayesian Multitask Learning with Latent Hierarchies," Proc. 25th Conf. Uncertainty in Artificial Intelligence, pp. 135-142, 2009.
46 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool