The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.10 - October (2010 vol.22)
pp: 1345-1359
Sinno Jialin Pan , Hong Kong University of Science and Technology, Hong Kong
Qiang Yang , Hong Kong University of Science and Technology, Hong Kong
ABSTRACT
A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data-labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression, and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning research.
INDEX TERMS
Transfer learning, survey, machine learning, data mining.
CITATION
Sinno Jialin Pan, Qiang Yang, "A Survey on Transfer Learning", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 10, pp. 1345-1359, October 2010, doi:10.1109/TKDE.2009.191
REFERENCES
[1] X. Wu, V. Kumar, J.R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G.J. McLachlan, A.F.M. Ng, B. Liu, P.S. Yu, Z.-H. Zhou, M. Steinbach, D.J. Hand, and D. Steinberg, "Top 10 Algorithms in Data Mining," Knowledge and Information Systems, vol. 14, no. 1, pp. 1-37, 2008.
[2] Q. Yang and X. Wu, "10 Challenging Problems in Data Mining Research," Int'l J. Information Technology and Decision Making, vol. 5, no. 4, pp. 597-604, 2006.
[3] G.P.C. Fung, J.X. Yu, H. Lu, and P.S. Yu, "Text Classification without Negative Examples Revisit," IEEE Trans. Knowledge and Data Eng., vol. 18, no. 1, pp. 6-20, Jan. 2006.
[4] H. Al Mubaid and S.A. Umair, "A New Text Categorization Technique Using Distributional Clustering and Learning Logic," IEEE Trans. Knowledge and Data Eng., vol. 18, no. 9, pp. 1156-1165, Sept. 2006.
[5] K. Sarinnapakorn and M. Kubat, "Combining Subclassifiers in Text Categorization: A DST-Based Solution and a Case Study," IEEE Trans. Knowledge and Data Eng., vol. 19, no. 12, pp. 1638-1651, Dec. 2007.
[6] W. Dai, Q. Yang, G. Xue, and Y. Yu, "Boosting for Transfer Learning," Proc. 24th Int'l Conf. Machine Learning, pp. 193-200, June 2007.
[7] S.J. Pan, V.W. Zheng, Q. Yang, and D.H. Hu, "Transfer Learning for WiFi-Based Indoor Localization," Proc. Workshop Transfer Learning for Complex Task of the 23rd Assoc. for the Advancement of Artificial Intelligence (AAAI) Conf. Artificial Intelligence, July 2008.
[8] J. Blitzer, M. Dredze, and F. Pereira, "Biographies, Bollywood, Boom-Boxes and Blenders: Domain Adaptation for Sentiment Classification," Proc. 45th Ann. Meeting of the Assoc. Computational Linguistics, pp. 432-439, 2007.
[9] J. Ramon, K. Driessens, and T. Croonenborghs, "Transfer Learning in Reinforcement Learning Problems through Partial Policy Recycling," Proc. 18th European Conf. Machine Learning (ECML '07), pp. 699-707, 2007.
[10] M.E. Taylor and P. Stone, "Cross-Domain Transfer for Reinforcement Learning," Proc. 24th Int'l Conf. Machine Learning (ICML '07), pp. 879-886, 2007.
[11] X. Yin, J. Han, J. Yang, and P.S. Yu, "Efficient Classification across Multiple Database Relations: A Crossmine Approach," IEEE Trans. Knowledge and Data Eng., vol. 18, no. 6, pp. 770-783, June 2006.
[12] L.I. Kuncheva and J.J. Rodrłguez, "Classifier Ensembles with a Random Linear Oracle," IEEE Trans. Knowledge and Data Eng., vol. 19, no. 4, pp. 500-508, Apr. 2007.
[13] E. Baralis, S. Chiusano, and P. Garza, "A Lazy Approach to Associative Classification," IEEE Trans. Knowledge and Data Eng., vol. 20, no. 2, pp. 156-171, Feb. 2008.
[14] X. Zhu, "Semi-Supervised Learning Literature Survey," Technical Report 1530, Univ. of Wisconsin-Madison, 2006.
[15] K. Nigam, A.K. McCallum, S. Thrun, and T. Mitchell, "Text Classification from Labeled and Unlabeled Documents Using EM," Machine Learning, vol. 39, nos. 2/3, pp. 103-134, 2000.
[16] A. Blum and T. Mitchell, "Combining Labeled and Unlabeled Data with Co-Training," Proc. 11th Ann. Conf. Computational Learning Theory, pp. 92-100, 1998.
[17] T. Joachims, "Transductive Inference for Text Classification Using Support Vector Machines," Proc. 16th Int'l Conf. Machine Learning, pp. 825-830, 1999.
[18] X. Zhu and X. Wu, "Class Noise Handling for Effective Cost-Sensitive Learning by Cost-Guided Iterative Classification Filtering," IEEE Trans. Knowledge and Data Eng., vol. 18, no. 10, pp. 1435-1440, Oct. 2006.
[19] Q. Yang, C. Ling, X. Chai, and R. Pan, "Test-Cost Sensitive Classification on Data with Missing Values," IEEE Trans. Knowledge and Data Eng., vol. 18, no. 5, pp. 626-638, May 2006.
[20] Learning to Learn. S. Thrun and L. Pratt, eds. Kluwer Academic Publishers, 1998.
[21] R. Caruana, "Multitask Learning," Machine Learning, vol. 28, no. 1, pp. 41-75, 1997.
[22] R. Raina, A. Battle, H. Lee, B. Packer, and A.Y. Ng, "Self-Taught Learning: Transfer Learning from Unlabeled Data," Proc. 24th Int'l Conf. Machine Learning, pp. 759-766, June 2007.
[23] H. DauméIII and D. Marcu, "Domain Adaptation for Statistical Classifiers," J. Artificial Intelligence Research, vol. 26, pp. 101-126, 2006.
[24] B. Zadrozny, "Learning and Evaluating Classifiers under Sample Selection Bias," Proc. 21st Int'l Conf. Machine Learning, July 2004.
[25] H. Shimodaira, "Improving Predictive Inference under Covariate Shift by Weighting the Log-Likelihood Function," J. Statistical Planning and Inference, vol. 90, pp. 227-244, 2000.
[26] W. Dai, Q. Yang, G. Xue, and Y. Yu, "Self-Taught Clustering," Proc. 25th Int'l Conf. Machine Learning, pp. 200-207, July 2008.
[27] Z. Wang, Y. Song, and C. Zhang, "Transferred Dimensionality Reduction," Proc. European Conf. Machine Learning and Knowledge Discovery in Databases (ECML/PKDD '08), pp. 550-565, Sept. 2008.
[28] W. Dai, G. Xue, Q. Yang, and Y. Yu, "Transferring Naive Bayes Classifiers for Text Classification," Proc. 22nd Assoc. for the Advancement of Artificial Intelligence (AAAI) Conf. Artificial Intelligence, pp. 540-545, July 2007.
[29] J. Quionero-Candela, M. Sugiyama, A. Schwaighofer, and N.D. Lawrence, Dataset Shift in Machine Learning. MIT Press, 2009.
[30] J. Jiang and C. Zhai, "Instance Weighting for Domain Adaptation in NLP," Proc. 45th Ann. Meeting of the Assoc. Computational Linguistics, pp. 264-271, June 2007.
[31] X. Liao, Y. Xue, and L. Carin, "Logistic Regression with an Auxiliary Data Source," Proc. 21st Int'l Conf. Machine Learning, pp. 505-512, Aug. 2005.
[32] J. Huang, A. Smola, A. Gretton, K.M. Borgwardt, and B. Schölkopf, "Correcting Sample Selection Bias by Unlabeled Data," Proc. 19th Ann. Conf. Neural Information Processing Systems, 2007.
[33] S. Bickel, M. Brückner, and T. Scheffer, "Discriminative Learning for Differing Training and Test Distributions," Proc. 24th Int'l Conf. Machine Learning, pp. 81-88, 2007.
[34] M. Sugiyama, S. Nakajima, H. Kashima, P.V. Buenau, and M. Kawanabe, "Direct Importance Estimation with Model Selection and its Application to Covariate Shift Adaptation," Proc. 20th Ann. Conf. Neural Information Processing Systems, Dec. 2008.
[35] W. Fan, I. Davidson, B. Zadrozny, and P.S. Yu, "An Improved Categorization of Classifier's Sensitivity on Sample Selection Bias," Proc. Fifth IEEE Int'l Conf. Data Mining, 2005.
[36] W. Dai, G. Xue, Q. Yang, and Y. Yu, "Co-Clustering Based Classification for Out-of-Domain Documents," Proc. 13th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, Aug. 2007.
[37] R.K. Ando and T. Zhang, "A High-Performance Semi-Supervised Learning Method for Text Chunking," Proc. 43rd Ann. Meeting on Assoc. for Computational Linguistics, pp. 1-9, 2005.
[38] J. Blitzer, R. McDonald, and F. Pereira, "Domain Adaptation with Structural Correspondence Learning," Proc. Conf. Empirical Methods in Natural Language, pp. 120-128, July 2006.
[39] H. DauméIII, "Frustratingly Easy Domain Adaptation," Proc. 45th Ann. Meeting of the Assoc. Computational Linguistics, pp. 256-263, June 2007.
[40] A. Argyriou, T. Evgeniou, and M. Pontil, "Multi-Task Feature Learning," Proc. 19th Ann. Conf. Neural Information Processing Systems, pp. 41-48, Dec. 2007.
[41] A. Argyriou, C.A. Micchelli, M. Pontil, and Y. Ying, "A Spectral Regularization Framework for Multi-Task Structure Learning," Proc. 20th Ann. Conf. Neural Information Processing Systems, pp. 25-32, 2008.
[42] S.I. Lee, V. Chatalbashev, D. Vickrey, and D. Koller, "Learning a Meta-Level Prior for Feature Relevance from Multiple Related Tasks," Proc. 24th Int'l Conf. Machine Learning, pp. 489-496, July 2007.
[43] T. Jebara, "Multi-Task Feature and Kernel Selection for SVMs," Proc. 21st Int'l Conf. Machine Learning, July 2004.
[44] C. Wang and S. Mahadevan, "Manifold Alignment Using Procrustes Analysis," Proc. 25th Int'l Conf. Machine Learning, pp. 1120-1127, July 2008.
[45] N.D. Lawrence and J.C. Platt, "Learning to Learn with the Informative Vector Machine," Proc. 21st Int'l Conf. Machine Learning, July 2004.
[46] E. Bonilla, K.M. Chai, and C. Williams, "Multi-Task Gaussian Process Prediction," Proc. 20th Ann. Conf. Neural Information Processing Systems, pp. 153-160, 2008.
[47] A. Schwaighofer, V. Tresp, and K. Yu, "Learning Gaussian Process Kernels via Hierarchical Bayes," Proc. 17th Ann. Conf. Neural Information Processing Systems, pp. 1209-1216, 2005.
[48] T. Evgeniou and M. Pontil, "Regularized Multi-Task Learning," Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 109-117, Aug. 2004.
[49] J. Gao, W. Fan, J. Jiang, and J. Han, "Knowledge Transfer via Multiple Model Local Structure Mapping," Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 283-291, Aug. 2008.
[50] L. Mihalkova, T. Huynh, and R.J. Mooney, "Mapping and Revising Markov Logic Networks for Transfer Learning," Proc. 22nd Assoc. for the Advancement of Artificial Intelligence (AAAI) Conf. Artificial Intelligence, pp. 608-614, July 2007.
[51] L. Mihalkova and R.J. Mooney, "Transfer Learning by Mapping with Minimal Target Data," Proc. Assoc. for the Advancement of Artificial Intelligence (AAAI '08) Workshop Transfer Learning for Complex Tasks, July 2008.
[52] J. Davis and P. Domingos, "Deep Transfer via Second-Order Markov Logic," Proc. Assoc. for the Advancement of Artificial Intelligence (AAAI '08) Workshop Transfer Learning for Complex Tasks, July 2008.
[53] P. Wu and T.G. Dietterich, "Improving SVM Accuracy by Training on Auxiliary Data Sources," Proc. 21st Int'l Conf. Machine Learning, July 2004.
[54] U. Rückert and S. Kramer, "Kernel-Based Inductive Transfer," Proc. European Conf. Machine Learning and Knowledge Discovery in Databases (ECML/PKDD '08), pp. 220-233, Sept. 2008.
[55] H. Lee, A. Battle, R. Raina, and A.Y. Ng, "Efficient Sparse Coding Algorithms," Proc. 19th Ann. Conf. Neural Information Processing Systems, pp. 801-808, 2007.
[56] M. Richardson and P. Domingos, "Markov Logic Networks," Machine Learning J., vol. 62, nos. 1/2, pp. 107-136, 2006.
[57] S. Ramachandran and R.J. Mooney, "Theory Refinement of Bayesian Networks with Hidden Variables," Proc. 14th Int'l Conf. Machine Learning, pp. 454-462, July 1998.
[58] A. Arnold, R. Nallapati, and W.W. Cohen, "A Comparative Study of Methods for Transductive Transfer Learning," Proc. Seventh IEEE Int'l Conf. Data Mining Workshops, pp. 77-82, 2007.
[59] T. Joachims, "Transductive Inference for Text Classification Using Support Vector Machines," Proc. 16th Int'l Conf. Machine Learning, pp. 200-209, 1999.
[60] V.N. Vapnik, Statistical Learning Theory. Wiley Interscience, Sept. 1998.
[61] S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, "Analysis of Representations for Domain Adaptation," Proc. 20th Ann. Conf. Neural Information Processing Systems, pp. 137-144, 2007.
[62] J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Wortman, "Learning Bounds for Domain Adaptation," Proc. 21st Ann. Conf. Neural Information Processing Systems, pp. 129-136, 2008.
[63] D. Xing, W. Dai, G.-R. Xue, and Y. Yu, "Bridged Refinement for Transfer Learning," Proc. 11th European Conf. Principles and Practice of Knowledge Discovery in Databases, pp. 324-335, Sept. 2007.
[64] X. Ling, W. Dai, G.-R. Xue, Q. Yang, and Y. Yu, "Spectral Domain-Transfer Learning," Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 488-496, Aug. 2008.
[65] G.-R. Xue, W. Dai, Q. Yang, and Y. Yu, "Topic-Bridged PLSA for Cross-Domain Text Classification," Proc. 31st Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 627-634, July 2008.
[66] S.J. Pan, J.T. Kwok, and Q. Yang, "Transfer Learning via Dimensionality Reduction," Proc. 23rd Assoc. for the Advancement of Artificial Intelligence (AAAI) Conf. Artificial Intelligence, pp. 677-682, July 2008.
[67] S.J. Pan, I.W. Tsang, J.T. Kwok, and Q. Yang, "Domain Adaptation via Transfer Component Analysis," Proc. 21st Int'l Joint Conf. Artificial Intelligence, 2009.
[68] M.M.H. Mahmud and S.R. Ray, "Transfer Learning Using Kolmogorov Complexity: Basic Theory and Empirical Evaluations," Proc. 20th Ann. Conf. Neural Information Processing Systems, pp. 985-992, 2008.
[69] E. Eaton, M. desJardins, and T. Lane, "Modeling Transfer Relationships between Learning Tasks for Improved Inductive Transfer," Proc. European Conf. Machine Learning and Knowledge Discovery in Databases (ECML/PKDD '08), pp. 317-332, Sept. 2008.
[70] M.T. Rosenstein, Z. Marx, and L.P. Kaelbling, "To Transfer or Not to Transfer," Proc. Conf. Neural Information Processing Systems (NIPS '05) Workshop Inductive Transfer: 10 Years Later, Dec. 2005.
[71] S. Ben-David and R. Schuller, "Exploiting Task Relatedness for Multiple Task Learning," Proc. 16th Ann. Conf. Learning Theory, pp. 825-830, 2003.
[72] B. Bakker and T. Heskes, "Task Clustering and Gating for Bayesian Multitask Learning," J. Machine Learning Research, vol. 4, pp. 83-99, 2003.
[73] A. Argyriou, A. Maurer, and M. Pontil, "An Algorithm for Transfer Learning in a Heterogeneous Environment," Proc. European Conf. Machine Learning and Knowledge Discovery in Databases (ECML/PKDD '08), pp. 71-85, Sept. 2008.
[74] R. Raina, A.Y. Ng, and D. Koller, "Constructing Informative Priors Using Transfer Learning," Proc. 23rd Int'l Conf. Machine Learning, pp. 713-720, June 2006.
[75] J. Yin, Q. Yang, and L.M. Ni, "Adaptive Temporal Radio Maps for Indoor Location Estimation," Proc. Third IEEE Int'l Conf. Pervasive Computing and Comm., Mar. 2005.
[76] S.J. Pan, J.T. Kwok, Q. Yang, and J.J. Pan, "Adaptive Localization in a Dynamic WiFi Environment through Multi-View Learning," Proc. 22nd Assoc. for the Advancement of Artificial Intelligence (AAAI) Conf. Artificial Intelligence, pp. 1108-1113, July 2007.
[77] V.W. Zheng, Q. Yang, W. Xiang, and D. Shen, "Transferring Localization Models over Time," Proc. 23rd Assoc. for the Advancement of Artificial Intelligence (AAAI) Conf. Artificial Intelligence, pp. 1421-1426, July 2008.
[78] S.J. Pan, D. Shen, Q. Yang, and J.T. Kwok, "Transferring Localization Models across Space," Proc. 23rd Assoc. for the Advancement of Artificial Intelligence (AAAI) Conf. Artificial Intelligence, pp. 1383-1388, July 2008.
[79] V.W. Zheng, S.J. Pan, Q. Yang, and J.J. Pan, "Transferring Multi-Device Localization Models Using Latent Multi-Task Learning," Proc. 23rd Assoc. for the Advancement of Artificial Intelligence (AAAI) Conf. Artificial Intelligence, pp. 1427-1432, July 2008.
[80] H. Zhuo, Q. Yang, D.H. Hu, and L. Li, "Transferring Knowledge from Another Domain for Learning Action Models," Proc. 10th Pacific Rim Int'l Conf. Artificial Intelligence, Dec. 2008.
[81] V.C. Raykar, B. Krishnapuram, J. Bi, M. Dundar, and R.B. Rao, "Bayesian Multiple Instance Learning: Automatic Feature Selection and Inductive Transfer," Proc. 25th Int'l Conf. Machine Learning, pp. 808-815, July 2008.
[82] X. Ling, G.-R. Xue, W. Dai, Y. Jiang, Q. Yang, and Y. Yu, "Can Chinese Web Pages be Classified with English Data Source?" Proc. 17th Int'l Conf. World Wide Web, pp. 969-978, Apr. 2008.
[83] Q. Yang, S.J. Pan, and V.W. Zheng, "Estimating Location Using Wi-Fi," IEEE Intelligent Systems, vol. 23, no. 1, pp. 8-13, Jan./Feb. 2008.
[84] X. Shi, W. Fan, and J. Ren, "Actively Transfer Domain Knowledge," Proc. European Conf. Machine Learning and Knowledge Discovery in Databases (ECML/PKDD '08), pp. 342-357, Sept. 2008.
[85] G. Kuhlmann and P. Stone, "Graph-Based Domain Mapping for Transfer Learning in General Games," Proc. 18th European Conf. Machine Learning, pp. 188-200, Sept. 2007.
[86] W. Dai, Y. Chen, G.-R. Xue, Q. Yang, and Y. Yu, "Translated Learning," Proc. 21st Ann. Conf. Neural Information Processing Systems, 2008.
[87] B. Li, Q. Yang, and X. Xue, "Transfer Learning for Collaborative Filtering via a Rating-Matrix Generative Model," Proc. 26th Int'l Conf. Machine Learning, June 2009.
[88] B. Li, Q. Yang, and X. Xue, "Can Movies and Books Collaborate? Cross-Domain Collaborative Filtering for Sparsity Reduction," Proc. 21st Int'l Joint Conf. Artificial Intelligence, July 2009.
7 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool