This Article 
 Bibliographic References 
 Add to: 
Semisupervised Multitask Learning
June 2009 (vol. 31 no. 6)
pp. 1074-1086
Qiuhua Liu, Duke University, Durham
Xuejun Liao, Duke University, Durham
Hui Li Carin, Signal Innovations Group, Inc., Durham
Jason R. Stack, Office of Naval Research, Arlington
Lawrence Carin, Duke University, Durham
Context plays an important role when performing classification, and in this paper we examine context from two perspectives. First, the classification of items within a single task is placed within the context of distinct concurrent or previous classification tasks (multiple distinct data collections). This is referred to as multi-task learning (MTL), and is implemented here in a statistical manner, using a simplified form of the Dirichlet process. In addition, when performing many classification tasks one has simultaneous access to all unlabeled data that must be classified, and therefore there is an opportunity to place the classification of any one feature vector within the context of all unlabeled feature vectors; this is referred to as semi-supervised learning. In this paper we integrate MTL and semi-supervised learning into a single framework, thereby exploiting two forms of contextual information. Example results are presented on a "toy" example, to demonstrate the concept, and the algorithm is also applied to three real data sets.

[1] Q. Liu, X. Liao, and L. Carin, “Semi-Supervised Multi-Task Learning,” Proc. Advances in Neural Information Processing Systems, 2007.
[2] Semi-Supervised Learning, O. Chapelle, B. Schlkopf, and A. Zien, eds. MIT Press, 2006.
[3] B. Krishnapuram, D. Williams, Y. Xue, A. Hartemink, L. Carin, and M. Figueiredo, “On Semi-Supervised Classification,” Proc. Advances in Neural Information Processing Systems, 2005.
[4] X. Zhu, Z. Ghahramani, and J. Lafferty, “Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions,” Proc. 20th Int'l Conf. Machine Learning, pp. 912-919, 2003.
[5] M. Szummer and T. Jaakkola, “Partially Labeled Classification with Markov Random Walks,” Proc. Advances in Neural Information Processing Systems, 2002.
[6] T. Joachims, “Transductive Inference for Text Classification Using Support Vector Machines,” Proc. 16th Int'l Conf. Machine Learning, pp. 200-209, 1999.
[7] A. Blum and T. Mitchell, “Combining Labeled and Unlabeled Data with Co-Training,” Proc. Ann. Conf. Learning Theory, pp. 92-100, 1998.
[8] M. Belkin, I. Matveeva, and P. Niyogi, “Regularization and Semi-Supervised Learning on Large Graphs,” Proc. Ann. Conf. Learning Theory, 2004.
[9] K. Nigam, A.K. McCallum, S. Thrun, and T. Mitchell, “Text Classification from Labeled and Unlabeled Documents Using EM,” Machine Learning, vol. 39, nos. 2/3, pp. 103-134, 2000.
[10] S. Ganesalingam, “Classification and Mixture Approaches to Clustering via Maximum Likelihood,” Applied Statistics, vol. 38, no. 3, pp. 455-466, 1989.
[11] J. Baxter, “Learning Internal Representations,” Proc. Workshop Computational Learning Theory, 1995.
[12] J. Baxter, “A Model of Inductive Bias Learning,” J. Artificial Intelligence Research, 2000.
[13] R. Caruana, “Multi-Task Learning,” Machine Learning, vol. 28, pp.41-75, 1997.
[14] K. Yu, A. Schwaighofer, V. Tresp, W.-Y. Ma, and H. Zhang, “Collaborative Ensemble Learning: Combining Collaborative and Content-Based Information Filtering via Hierarchical Bayes,” Proc. 19th Conf. Uncertainty in Artificial Intelligence, 2003.
[15] K. Yu, V. Tresp, and S. Yu, “A Nonparametric Hierarchical Bayesian Framework for Information Filtering,” Proc. 27th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, 2004.
[16] K. Yu, A. Schwaighofer, and V. Tresp, “Learning Gaussian Processes from Multiple Tasks,” Proc. 22nd Int'l Conf. Machine Learning, 2005.
[17] J. Zhang, Z. Ghahramani, and Y. Yang, “Learning Multiple Related Tasks Using Latent Independent Component Analysis,” Proc. Advances in Neural Information Processing Systems, Y. Weiss, B.Schölkopf, and J. Platt, eds., 2006.
[18] N.D. Lawrence and J.C. Platt, “Learning to Learn with the Informative Vector Machine,” Proc. 21st Int'l Conf. Machine Learning, 2004.
[19] S. Thrun and J. O'Sullivan, “Discovering Structure in Multiple Learning Tasks: The TC Algorithm,” Proc. 13th Int'l Conf. Machine Learning, 1996.
[20] R.K. Ando and T. Zhang, “A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data,” J. Machine Learning Research, vol. 6, pp. 1817-1853, 2005.
[21] T. Evgeniou, C.A. Micchelli, and M. Pontil, “Learning Multiple Tasks with Kernel Methods,” J. Machine Learning Research, vol. 6, pp. 615-637, 2005.
[22] G.V. Glass, “Primary, Secondary and Meta-Analysis of Research,” Educational Researcher, vol. 5, 1976.
[23] D. Burr and H. Doss, “A Bayesian Semiparametric Model for Random-Effects Meta-Analysis,” J. Am. Statistical Assoc., vol. 100, no. 469, pp. 242-251, Mar. 2005.
[24] F. Dominici, G. Parmigiani, R. Wolpert, and K. Reckhow, “Combining Information From Related Regressions,” J. Agricultural, Biological, and Environmental Statistics, vol. 2, no. 3, pp. 294-312, 1997
[25] P.D. Hoff, “Nonparametric Modeling of Hierarchically Exchangeable data,” Technical Report 421, Statistics Dept., Univ. of Washington, 2003.
[26] P. Müller, F. Quintana, and G. Rosner, “A Method for Combining Inference across Related Nonparametric Bayesian Models,” J.Royal Statistical Soc. Series B. vol. 66, no. 3, pp. 735-749, 2004.
[27] B.K. Mallick and S.G. Walker, “Combining Information from Several Experiments with Nonparametric Priors,” Biometrika, vol. 84, no. 3, pp. 697-706, 1997.
[28] S. Mukhopadhyay and A.E. Gelfand, “Dirichlet Process Mixed Generalized Linear Models,” J. Am. Statistical Assoc., vol. 92, no. 438, pp. 633-639, 1997.
[29] T. Ferguson, “A Bayesian Analysis of Some Nonparametric Problems,” Annals of Statistics, vol. 1, pp. 209-230, 1973.
[30] Y. Xue, X. Liao, L. Carin, and B. Krishnapuram, “Multi-Task Learning for Classification with Dirichlet Process Priors,” J.Machine Learning Research, vol. 8, pp. 35-63, 2007.
[31] B. Scholkopf and A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.
[32] M.E. Tipping, “The Relevance Vector Machine,” Proc. Advances in Neural Information Processing Systems, S.A. Solla, T.K. Leen, and K.-R. Müller, eds., 2000.
[33] Q. Liu, X. Liao, and L. Carin, “Learning Classifiers on a Partially Labeled Data Manifold,” Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, 2007.
[34] D.J.C. Mackay, Information Theory, Inference and Learning Algorithms. Cambridge Univ. Press, 2003.
[35] Y. Zhang, X. Liao, and L. Carin, “Detection of Buried Targets via Active Selection of Labeled Data: Applications to Sensing Subsurface Uxo,” IEEE Trans. Geosience and Remote Sensing, vol. 42, pp.2535-2543, 2004.
[36] D. Blackwell and J. MacQueen, “Ferguson Distributions via Polya Urn Schemes,” Annals of Statistics, vol. 1, pp. 353-355, 1973.
[37] N.I.M. Gould and S. Leyffer, “An Introduction to Algorithms for Nonlinear Optimization,” Frontiers in Numerical Analysis, J.F.Blowey, and A.W. Craig, eds., pp. 109-197, 2003.
[38] J. Hanley and B. McNeil, “The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve,” Radiology, vol. 143, pp. 29-36, 1982.
[39] C. Cortes and M. Mohri, “AUC Optimization vs. Error Rate Minimization,” Proc. Advances in Neural Information Processing Systems, S. Thrun, L. Saul, and B. Scholkopf, eds., 2004.
[40] G.E. Hinton and T.J. Sejnowski, “Learning and Relearning in Boltzmann Machines,” Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1, J.L. McClelland, D.E.Rumelhart, and the PDP Research Group, eds., pp. 282-317, MIT Press, 1986.
[41] J.R. Stack, F. Crosby, R.J. McDonald, Y. Xue, and L. Carin, “Multi-Task Learning for Underwater Object Classification,” Proc. SPIE Defense and Security Symp., vol. 6553, pp. 1-10, 2007.
[42] M.I. Jordan, D.A. Cohn, and Z. Ghahramani, “Active Learning with Statistical Models,” Proc. Advances in Neural Information Processing Systems, 1996.

Index Terms:
Machine learning, Pattern Recognition
Qiuhua Liu, Xuejun Liao, Hui Li Carin, Jason R. Stack, Lawrence Carin, "Semisupervised Multitask Learning," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 6, pp. 1074-1086, June 2009, doi:10.1109/TPAMI.2008.296
Usage of this product signifies your acceptance of the Terms of Use.