The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.07 - July (2010 vol.22)
pp: 957-968
Tsuyoshi Kato , Ochanomizu University, Tokyo
Hisahi Kashima , IBM Research, Yamato
Masashi Sugiyama , Tokyo Institute of Technology, Tokyo
Kiyoshi Asai , The University of Tokyo, Chiba
ABSTRACT
When we have several related tasks, solving them simultaneously has been shown to be more effective than solving them individually. This approach is called multitask learning (MTL). In this paper, we propose a novel MTL algorithm. Our method controls the relatedness among the tasks locally, so all pairs of related tasks are guaranteed to have similar solutions. We apply the above idea to support vector machines and show that the optimization problem can be cast as a second-order cone program, which is convex and can be solved efficiently. The usefulness of our approach is demonstrated in ordinal regression, link prediction, and collaborative filtering, each of which can be formulated as a structured multitask problem.
INDEX TERMS
Multitask learning, second-order cone programming, ordinal regression, link prediction, collaborative filtering.
CITATION
Tsuyoshi Kato, Hisahi Kashima, Masashi Sugiyama, Kiyoshi Asai, "Conic Programming for Multitask Learning", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 7, pp. 957-968, July 2010, doi:10.1109/TKDE.2009.142
REFERENCES
[1] Y. Amit, M. Fink, N. Srebro, and S. Ullman, "Uncovering Shared Structures in Multiclass Calssification," Proc. 24th Int'l Conf. Machine Learning, pp. 17-24, 2007.
[2] R. Caruana, "Multitask Learning," Machine Learning, vol. 28, no. 1, pp. 41-75, 1997.
[3] S. Thrun and L. Pratt, Learning to Learn. Springer, 1997.
[4] J. Baxter, "A Model of Inductive Bias Learning," J. Artificial Intelligence Research, vol. 12, pp. 149-198, 2000.
[5] B. Bakker and T. Heskes, "Task Clustering and Gating for Bayesian Multitask Learning," J. Machine Learning Research, vol. 4, pp. 83-99, 2003.
[6] T. Evgeniou and M. Pontil, "Regularized Multitask Learning," Proc. ACM SIGKDD, pp. 109-117, 2004.
[7] T. Evgeniou, C.A. Micchelli, and M. Pontil, "Learning Multiple Tasks with Kernel Methods," J. Machine Learning Research, vol. 6, pp. 615-637, 2005.
[8] N.D. Lawrence and J.C. Platt, "Learning to Learn with the Informative Vector Machine," Proc. 21st Int'l Conf. Machine Learning, pp. 512-519, 2004.
[9] C.A. Micchelli and M. Pontil, "Kernels for Multi-Task Learning," Advances in Neural Information Processing Systems, vol. 17, pp. 921-928, MIT Press, 2005.
[10] K. Yu, V. Tresp, and A. Schwaighofer, "Learning Gaussian Processes from Multiple Tasks," Proc. 22nd Int'l Conf. Machine Learning, pp. 1012-1019, 2005.
[11] E.V. Bonilla, F.V. Agakov, and C.K.I. Williams, "Kernel Multi-Task Learning Using Task-Specific Features," Proc. 11th Int'l Conf. Artificial Intelligence and Statistics, pp. 43-50, 2007.
[12] K. Tsuda and W.S. Noble, "Learning Kernels from Biological Networks by Maximizing Entropy," Bioinformatics, vol. 20, no. suppl. 1, pp. i326-i333, 2004.
[13] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge Univ. Press, 2004.
[14] A. Shashua and A. Levin, "Ranking with Large Margin Principle: Two Approaches," Advances in Neural Information Processing Systems, vol. 15, pp. 937-944, MIT Press, 2003.
[15] T. Kato, K. Tsuda, and K. Asai, "Selective Integration of Multiple Biological Data for Supervised Network Inference," Bioinformatics, vol. 21, pp. 2488-2495, 2005.
[16] J.-P. Vert and Y. Yamanishi, "Supervised Graph Inference," Advances in Neural Information Processing Systems, vol. 17. MIT Press, 2005.
[17] Y. Yamanishi, J.P. Vert, and M. Kanehisa, "Supervised Enzyme Network Inference from the Integration of Genomic Data and Chemical Information," Bioinformatics, vol. 21, suppl. 1, pp. i468-i477, June 2005.
[18] K. Bleakley, G. Biau, and J.-P. Vert, "Supervised Reconstruction of Biological Networks with Local Models," Bioinformatics, vol. 23, no. 13, pp. i57-i65, 2007.
[19] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, and R. Harshman, "Indexing by Latent Semantic Analysis," J. Am. Soc. for Information Science, vol. 41, no. 6, pp. 391-407, 1990.
[20] Y. Xue, X. Liao, L. Carin, and B. Krishnapuram, "Multi-Task Learning for Classification with Dirichlet Process Priors," J. Machine Learning Research, vol. 8, pp. 35-63, 2007.
[21] V.N. Vapnik, Statistical Learning Theory. Wiley, 1998.
[22] B. Borchers, "CSDP, a C Library for Semidefinite Programming," Optimization Methods and Software, vol. 11, no. 1, pp. 613-623, 1999.
[23] M. Lobo, L. Vandenberghe, S. Boyd, and H. Lebret, "Applications of Second-Order Cone Programming," Linear Algebra and its Applications, vol. 284, pp. 193-228, 1998.
[24] X. Zhu, J. Kandola, Z. Ghahramani, and J. Lafferty, "Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning," Advances in Neural Information Processing Systems, vol. 17, pp. 1641-1648, MIT Press, 2004.
[25] D. Haussler, "Convolution Kernels on Discrete Structures," Technical Report UCSC-CRL-99-10, UC Santa Cruz, July 1999.
[26] T. Jaakkola and D. Haussler, "Exploiting Generative Models in Discriminative Classifiers," Advances in Neural Information Processing Systems, M.S. Kearns, S.A. Solla, and D.A. Cohn, eds., vol. 11, pp. 487-493, MIT Press, 1999.
[27] H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins, "Text Classification Using String Kernels," J. Machine Learning Research, vol. 2, pp. 419-444, 2002.
[28] R.I. Kondor and J. Lafferty, "Diffusion Kernels on Graphs and Other Discrete Input Spaces," Proc. 19th Int'l Conf. Machine Learning, pp. 315-322, 2002.
[29] C. Leslie, E. Eskin, and W.S. Noble, "The Spectrum Kernel: A String Kernel for SVM Protein Classification," Proc. Pacific Symp. Biocomputing, pp. 566-575, 2002.
[30] H. Kashima and T. Koyanagi, "Kernels for Semi-Structured Data," Proc. 19th Int'l Conf. Machine Learning, pp. 291-298, 2002.
[31] T. Gärtner, "A Survey of Kernels for Structured Data," SIGKDD Explorations, vol. 5, no. 1, pp. S268-S275, 2003.
[32] T. Gärtner, P. Flach, and S. Wrobel, "On Graph Kernels: Hardness Results and Efficient Alternatives," Proc. 16th Ann. Conf. Computational Learning Theory, pp. 129-143, 2003.
[33] H. Kashima, K. Tsuda, and A. Inokuchi, "Marginalized Kernels between Labeled Graphs," Proc. 20th Int'l Conf. Machine Learning, pp. 321-328, 2003.
[34] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, "Network Motifs: Simple Building Blocks of Complex Networks," Science, vol. 298, pp. 824-827, Jan. 2002.
[35] A. Andreeva, D. Howorth, S.E. Brenner, T.J.P. Hubbard, C. Chothia, and A.G. Murzin, "SCOP Database in 2004: Refinements Integrate Structure and Sequence Family Data," Nuclear Acid Research, vol. 32, pp. D226-D229, 2004.
[36] E.L. Lehmann and J.P. Romano, Testing Statistical Hypotheses. Springer, 2005.
[37] C. von Mering, R. Krause, B. Snel, M. Cornell, S.G. Oliver, S. Fields, and P. Bork, "Comparative Assessment of Large-Scale Data Sets of Protein-Protein Interactions," Nature, vol. 417, pp. 399-403, 2002.
[38] G.R.G. Lanckriet, T.D. Bie, N. Cristianini, M.I. Jordan, and W.S. Noble, "A Statistical Framework for Genomic Data Fusion," Bioinformatics, vol. 20, pp. 2626-2635, 2004.
[39] M. Kurucz, A.A. Benczúr, T. Kiss, I. Nagy, A. Szabó, and B. Torma, "KDD Cup 2007 Task1 Winner Report," ACM SIGKDD Explorations Newsletter, vol. 9, no. 2, pp. 53-56, 2008.
[40] N. Srebro, J.D.M. Rennie, and T.S. Jaakkola, "Maximum-Margin Matrix Factorization," Advances in Neural Information Processing Systems, L. Saul, Y. Weiss, and L. Bottou, eds., vol. 17, pp. 1329-1336, MIT Press, 2005.
[41] E. Bonilla, K.M. Chai, and C. Williams, "Multi-Task Gaussian Process Prediction" Advances in Neural Information Processing Systems, J. Platt, D. Koller, Y. Singer, and S. Roweis, eds., vol. 20, pp. 153-160, MIT Press, 2008.
[42] X. Liao, Y. Xue, and L. Carin, "Logistic Regression with an Auxiliary Data Source," Proc. 22nd Int'l Conf. Machine Learning, pp. 505-512, 2005.
[43] Q. Liu, X. Liao, and L. Carin, "Semi-Supervised Multitask Learning," Advances in Neural Information Processing Systems, J. Platt, D. Koller, Y. Singer, and S. Roweis, eds., vol. 20, pp. 937-944, MIT Press, 2008.
[44] B. Schölkopf, J.C. Platt, J. Shawe-Taylor, A.J. Smola, and R.C. Williamson, "Estimating the Support of a High-Dimensional Distribution," Neural Computation, vol. 13, no. 7, pp. 1443-1471, 2001.
[45] D.M.J. Tax and R.P.W. Duin, "Support Vector Data Description," Machine Learning, vol. 54, no. 1, pp. 45-66, 2004.
[46] H. Drucker, C.J.C. Burges, L. Kaufman, A. Smola, and V. Vapnik, "Support Vector Regression Machines," Advances in Neural Information Processing Systems, M. C. Mozer, M. I. Jordan, and T. Petsche, eds., vol. 9, pp. 155-161, MIT Press, 1997.
[47] C.-C. Chang and C.-J. Lin, "Training ν-Support Vector Regression: Theory and Algorithms," Neural Computation, vol. 14, no. 8, pp. 1959-1977, 2002.
[48] B. Schölkopf, A. Smola, R. Williamson, and P. Bartlett, "New Support Vector Algorithms," Neural Computation, vol. 12, no. 5, pp. 1207-1245, 2000.
[49] F. Perez-Cruz, D.J.L.H.J. Weston, and B. Schölkopf, "Extension of the ν-SVM Range for Classification," Advances in Learning Theory: Methods, Models and Applications, J.A.K. Suykens, G. Horvath, S. Basu, C. Micchelli, and J. Vandewalle, eds., vol. 190, pp. 179-196, IOS Press, 2003,
[50] P.H. Chen, C.J. Lin, and B. Schölkopf, "A Tutorial on ν-Support Vector Zmachines," Applied Stochastic Models in Business and Industry, vol. 21, no. 2, pp. 111-136, 2005.
[51] B. Efron, T. Hastie, R. Tibshirani, and I. Johnstone, "Least Angle Regression," The Annals of Statistics, vol. 32, no. 2, pp. 407-499, 2004.
[52] T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu, "The Entire Regularization Path for the Support Vector Machine," J. Machine Learning Research, vol. 5, pp. 1391-1415, 2004.
[53] G.R.G. Lanckriet, N. Cristianini, P. Bartlett, L.E. Ghaoui, and M.I. Jordan, "Learning the Kernel Matrix with Semidefinite Programming," J. Machine Learning Research, vol. 5, pp. 27-72, Jan. 2004.
[54] T. Kato, H. Kashima, and M. Sugiyama, "Integration of Multiple Networks for Robust Label Propagation," Proc. 2008 SIAM Int'l Conf. Data Mining (SDM '08), pp. 716-726, 2008.
22 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool