The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.10 - Oct. (2012 vol.24)
pp: 1789-1802
Brian Quanz , University of Kansas, Lawrence
Jun (Luke) Huan , University of Kansas, Lawrence
Meenakshi Mishra , University of Kansas, Lawrence
ABSTRACT
Effectively utilizing readily available auxiliary data to improve predictive performance on new modeling tasks is a key problem in data mining. In this research, the goal is to transfer knowledge between sources of data, particularly when ground-truth information for the new modeling task is scarce or is expensive to collect where leveraging any auxiliary sources of data becomes a necessity. Toward seamless knowledge transfer among tasks, effective representation of the data is a critical but yet not fully explored research area for the data engineer and data miner. Here, we present a technique based on the idea of sparse coding, which essentially attempts to find an embedding for the data by assigning feature values based on subspace cluster membership. We modify the idea of sparse coding by focusing the identification of shared clusters between data when source and target data may have different distributions. In our paper, we point out cases where a direct application of sparse coding will lead to a failure of knowledge transfer. We then present the details of our extension to sparse coding, by incorporating distribution distance estimates for the embedded data, and show that the proposed algorithm can overcome the shortcomings of the sparse coding algorithm on synthetic data and achieve improved predictive performance on a real world chemical toxicity transfer learning task.
INDEX TERMS
Encoding, Vectors, Knowledge transfer, Kernel, Feature extraction, Estimation, Equations, low-quality data., Knowledge transfer, transfer learning, feature extraction, sparse coding
CITATION
Brian Quanz, Jun (Luke) Huan, Meenakshi Mishra, "Knowledge Transfer with Low-Quality Data: A Feature Extraction Issue", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 10, pp. 1789-1802, Oct. 2012, doi:10.1109/TKDE.2012.75
REFERENCES
[1] W. Dai, Q. Yang, G. Xue, and Y. Yu, "Boosting for Transfer Learning," Proc. 24th Int'l Conf. Machine Learning, 2007.
[2] W. Dai, O. Jin, G. Xue, Q. Yang, and Y. Yu, "Eigentransfer: A Unified Framework for Transfer Learning," Proc. 26th Ann. Int'l Conf. Machine Learning, 2009.
[3] J. Gao, W. Fan, J. Jiang, and J. Han, "Knowledge Transfer via Multiple Model Local Structure Mapping," Proc. 14th ACM SIGKDD Conf. Knowledge Discovery and Data Mining, 2008.
[4] S.J. Pan, J.T. Kwok, and Q. YangPan, "Transfer Learning via Dimensionality Reduction," Proc. 23rd AAAI Conf. Artificial Intelligence, pp. 677-682, 2008.
[5] S. Bicke, C. Sawade, and T. Scheffer, "Transfer Learning by Distribution Matching for Targeted Advertising," Proc. Advances in Neural Information Processing Systems, 2008.
[6] J. Huang, A. Smola, A. Gretton, K.M. Borgwardt, and B. Schölkopf, "Correcting Sample Selection Bias by Unlabeled Data," Proc. 20th Conf. Neural Information Processing Systems, 2006.
[7] R. Raina, A. Battle, H. Lee, B. Packer, and A.Y. Ng, "Self-Taught Learning: Transfer Learning from Unlabeled Data," Proc. 24th Int'l Conf. Machine Learning, pp. 759-766, 2007.
[8] S. Satpal and S. Sarawagi, "Domain Adaptation of Conditional Probability Models via Feature Subsetting," Proc. 11th European Conf. Principles and Practice of Knowledge Discovery in Databases, 2007.
[9] M. Sugiyama, S. Nakajima, H. Kashima, P.V. Buenau, and M. Kawanabe, "Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation," Proc. Advances in Neural Information Processing Systems, 2007.
[10] W. Dai, G.-R. Xue, Q. Yang, and Y. Yu, "Co-Clustering Based Classification for Out-of-Domain Documents," Proc. 13th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2007.
[11] H. Larochelle, D. Erhan, and Y. Bengio, "Zero-Data Learning of New Tasks," AAAI '08: Proc. 23rd Nat'l Conf. Artificial Intelligence, 2008.
[12] S. Xie, W. Fan, J. Peng, O. Verscheure, and J. Ren, "Latent Space Domain Transfer between High Dimensional Overlapping Distributions," Proc. 18th Int'L Conf. World Wide Web, pp. 91-100, 2009.
[13] Y. Xue, D. Dunson, and L. Carin, "The Matrix Stick-Breaking Process for Flexible Multi-Task Learning," Proc. 24th Int'l Conf. Machine Learning (ICML), 2007.
[14] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, 2009.
[15] D. Bradley and J. Bagnell, "Differentiable Sparse Coding," Proc. Neural Information Processing Systems, pp. 113-120, 2008,
[16] B.A. Olshausen and D.J. Field, "Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images," Nature, vol. 381, no. 6583, pp. 607-609, 1996.
[17] E. Elhamifar and R. Vidal, "Sparse Subspace Clustering," Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 2790-2797, June 2009.
[18] E. Candes and Y. Plan, "Matrix Completion with Noise," Proc. IEEE, vol. 98, no. 6, pp. 925-936, June 2010.
[19] B. Quanz, J. Huan, and M. Mishra, "Knowledge Transfer with Low-Quality Data: A Feature Extraction Issue," Proc. IEEE 27th Int'l Conf. Data Eng. (ICDE), 2011.
[20] S.J. Pan and Q. Yang, "A Survey on Transfer Learning," IEEE Trans. Knowledge and Data Eng., vol. 22, no. 10, pp. 1345-1359, Oct. 2010.
[21] C. Chelba and A. Acero, "Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lot," Computer Speech and Language, vol. 20, no. 4, pp. 382-399, 2006.
[22] H. Daumeé III and D. Marcu, "Domain Adaptation for Statistical Classifers," J. Artificial Intelligence Research, vol. 26, pp. 101-126, 2006.
[23] X. Ling, W. Dai, G. Xue, Q. Yang, and Y. Yu, "Spectral Domain-Transfer Learning," Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2008.
[24] E. Zhong, W. Fan, J. Peng, K. Zhang, J. Ren, D. Turaga, and O. Verscheure, "Cross Domain Distribution Adaptation Via Kernel Mapping," Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 1027-1036, 2009.
[25] S.J. Pan, I.W. Tsang, J.T. Kwok, and Q. Yang, "Domain Adaptation via Transfer Component Analysis," IEEE Trans. Neural Networks, vol. 22, no. 2, pp. 199-210, Feb. 2011.
[26] A. Gretton, K.M. Borgwardt, M. Rasch, B. Scholkopf, and A.J. Smola, "A Kernel Method for the Two-Sample-Problem," Proc. Advances in Neural Information Processing Systems 19, 2007.
[27] H. Lee, A. Battle, R. Raina, and A. Ng, "Efficient Sparse Coding Algorithms," Proc. Advances in Neural Information Processing Systems, vol. 19, p. 801, 2007.
[28] H. Lee, R. Raina, A. Teichman, and A.Y. Ng, "Exponential Family Sparse Coding with Applications to Self-Taught Learning," Proc. 21st Int'l Joint Conf. Artificial Intelligence, 2009.
[29] K. Yu, T. Zhang, and Y. Gong, "Nonlinear Learning Using Local Coordinate Coding," Proc. Advances in Neural Information Processing Systems, vol. 22, Dec. 2009.
[30] L. Wasseerman, All of Statistics: A Concise Course in Statistical Inference. Springer, 2004.
[31] S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, "Analysis of Representations for Domain Adaptation," Proc. Advances in Neural Information Processing Systems 20, 2007.
[32] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Vaughan, "A Theory of Learning from Different Domains," Machine Learning, vol. 79, pp. 151-175, 2010.
[33] K. Yamazaki, M. Kawanabe, S. Watanabe, M. Sugiyama, and K. Müller, "Asymptotic Bayesian Generalization Error when Training and Test Distributions Are Different," Proc. 24th Int'l Conf. Machine Learning, pp. 1079-1086, 2007.
[34] P. Tseng and S. Yun, "A Coordinate Gradient Descent Method for Nonsmooth Separable Minimization," Math. Programming, vol. 117, pp. 387-423, 2009.
[35] R. Judson et al., "In Vitro Screening of Environmental Chemicals for Targeted Testing Prioritization: The ToxCast Project," Environmental Health Perspectives, vol. 118, no. 4, pp. 485-492, 2010.
[36] Talete srl, DRAGON (Software for Molecular Descriptor Caluclations), Talete srl, http://www.talete.mi.it/productsdragon_ description.htm , 2007.
[37] A. Smalter Hall, "Genome-Wide Protein-Chemical Interaction Prediction," PhD dissertation, Univ. Kansas, 2011.
[38] F. Pérez-Cruz, "Estimation of Information Theoretic Measures for Continuous Random Variables," Proc. Advances in Neural Information Processing Systems, vol. 21, pp. 1257-1264, 2008.
[39] A. Blum and T. Mitchell, "Combining Labeled and Unlabeled Data with Co-Training," Proc. 11th Ann. Conf. Computational Learning Theory, pp. 92-100, 1998.
44 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool