This Article 
 Bibliographic References 
 Add to: 
Multi-Output Regularized Feature Projection
December 2006 (vol. 18 no. 12)
pp. 1600-1613
Dimensionality reduction by feature projection is widely used in pattern recognition, information retrieval, and statistics. When there are some outputs available (e.g., regression values or classification results), it is often beneficial to consider supervised projection, which is based not only on the inputs, but also on the target values. While this applies to a single-output setting, we are more interested in applications with multiple outputs, where several tasks need to be learned simultaneously. In this paper, we introduce a novel projection approach called Multi-Output Regularized feature Projection (MORP), which preserves the information of input features and, meanwhile, captures the correlations between inputs/outputs and (if applicable) between multiple outputs. This is done by introducing a latent variable model on the joint input-output space and minimizing the reconstruction errors for both inputs and outputs. It turns out that the mappings can be found by solving a generalized eigenvalue problem and are ready to extend to nonlinear mappings. Prediction accuracy can be greatly improved by using the new features since the structure of outputs is explored. We validate our approach in two applications. In the first setting, we predict users' preferences for a set of paintings. The second is concerned with image and text categorization where each image (or document) may belong to multiple categories. The proposed algorithm produces very encouraging results in both settings.

[1] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer Verlag, 2001.
[2] P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, July 1997.
[3] S.C. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas, and R.A. Harshman, “Indexing by Latent Semantic Analysis,” J. Am. Soc. Information Science, vol. 41, no. 6, pp. 391-407, 1990.
[4] M.W. Berry, S.T. Dumais, and G.W. O'Brien, “Using Linear Algebra for Intelligent Information Retrieval,” SIAM Rev., vol. 37, no. 4, pp. 573-595, 1995.
[5] C. Basu, H. Hirsh, and W.W. Cohen, “Recommendation as Classification: Using Social and Content-Based Information in Recommendation,” Proc. 15th Nat'l Conf. Artificial Intelligence AAAI/IAAI, pp. 714-720, 1998.
[6] A.N. Tikhonov and V.Y. Arsenin, Solutions of Ill-Posed Problems. New York: Wiley, 1977.
[7] B. Schölkopf, A. Smola, and K.-R. Müller, “Nonlinear Component Analysis as a Kernel Eigenvalue Problem,” Neural Computation, vol. 10, pp. 1299-1319, 1998.
[8] H. Zou, T. Hastie, and R. Tibshirani, “Sparse Principal Component Analysis,” technical report, Statistics Dept., Stanford Univ., 2005.
[9] V. Vapnik, The Nature of Statistical Learning Theory. New York: Springer, 1995.
[10] G. H. Golub and C.F. Van Loan, Matrix Computations. The Johns Hopkins Univ. Press, 1996.
[11] K. Mardia, J. Kent, and J. Bibby, Multivariate Analysis. Academic Press, 1979.
[12] B. Schölkopf, A.J. Smola, and K.-R. Müller, “Kernel Principal Component Analysis,” Advances in Kernel Methods—Support Vector Learning, pp. 327-352, 1999.
[13] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge Univ. Press, 2004.
[14] H. Hotelling, “Relations between Two Sets of Variables,” Biometrika, vol. 28, pp. 321-377, 1936.
[15] D.R. Hardoon, S. Szedmak, and J. Shawe-Taylor, “Canonical Correlation Analysis; An Overview with Application to Learning Methods,” technical report, Royal Holloway Univ. of London, 2003.
[16] H. Wold, “Soft Modeling by Latent Variables; The Nonlinear Iterative Partial Least Squares Approach,” Perspectives in Probability and Statistics, Papers in Honour of M.S. Bartlett, 1975.
[17] R. Rosipal and L.J. Trejo, “Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space,” J. Machine Learning Research, vol. 2, no. 12, pp. 97-123, 2001.
[18] J. Weston, O. Chapelle, A. Elisseeff, B. Schölkopf, and V. Vapnik, “Kernel Dependency Estimation,” Advances in Neural Information Processing Systems 15, S. Thrun, S. Becker, and K. Obermayer, eds., MIT Press, 2003.
[19] K. Fukumizu, F.R. Bach, and M.I. Jordan, “Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces,” J. Machine Learning Research, vol. 5, pp. 73-99, Jan. 2004.
[20] T. Evgeniou and M. Pontil, “Regularized Multi-Task Learning,” Proc. ACM SIGKDD, 2004.
[21] K. Yu, V. Tresp, and S. Yu, “A Nonparametric Hierarchical Bayesian Framework for Information Filtering,” Proc. 27th Ann. Int'l ACM SIGIR Conf., 2004.
[22] A. Schwaighofer, V. Tresp, and K. Yu, “Hierarchical Bayesian Modelling with Gaussian Processes,” Advances in Neural Information Processing Systems 17, MIT Press, 2005.
[23] B. Schölkopf and A.J. Smola, Learning with Kernels. MIT Press, 2002.
[24] K. Fukunaga, Statistical Pattern Recognition, second ed. Academic Press, 1990.
[25] H. Wold, “Partial Least Squares,” Encyclopedia of the Statistical Sciences, pp. 581-591, 1985.
[26] D.D. Lewis, Y. Yang, T. Rose, and F. Li, “RCV1: A New Benchmark Collection for Text Categorization Research,” J.Machine Learning Research, vol. 5, pp. 361-397, 2004.

Index Terms:
Dimensionality reduction, supervised projection, feature transformation.
Shipeng Yu, Kai Yu, Volker Tresp, Hans-Peter Kriegel, "Multi-Output Regularized Feature Projection," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 12, pp. 1600-1613, Dec. 2006, doi:10.1109/TKDE.2006.194
Usage of this product signifies your acceptance of the Terms of Use.