The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.11 - Nov. (2012 vol.34)
pp: 2216-2232
Tingting Mu , University of Manchester, Manchester
John Yannis Goulermas , University of Liverpool, Liverpool
Jun'ichi Tsujii , Microsoft Research Asia, China
Sophia Ananiadou , University of Manchester, Manchester
ABSTRACT
This paper is about supervised and semi-supervised dimensionality reduction (DR) by generating spectral embeddings from multi-output data based on the pairwise proximity information. Two flexible and generic frameworks are proposed to achieve supervised DR (SDR) for multilabel classification. One is able to extend any existing single-label SDR to multilabel via sample duplication, referred to as MESD. The other is a multilabel design framework that tackles the SDR problem by computing weight (proximity) matrices based on simultaneous feature and label information, referred to as MOPE, as a generalization of many current techniques. A diverse set of different schemes for label-based proximity calculation, as well as a mechanism for combining label-based and feature-based weight information by considering information importance and prioritization, are proposed for MOPE. Additionally, we summarize many current spectral methods for unsupervised DR (UDR), single/multilabel SDR, and semi-supervised DR (SSDR) and express them under a common template representation as a general guide to researchers in the field. We also propose a general framework for achieving SSDR by combining existing SDR and UDR models, and also a procedure of reducing the computational cost via learning with a target set of relation features. The effectiveness of our proposed methodologies is demonstrated with experiments with document collections for multilabel text categorization from the natural language processing domain.
INDEX TERMS
Laplace equations, Optimization, Principal component analysis, Kernel, Symmetric matrices, Natural language processing, Vectors, embeddings, Dimensionality reduction, supervised, semi-supervised, multilabel classification
CITATION
Tingting Mu, John Yannis Goulermas, Jun'ichi Tsujii, Sophia Ananiadou, "Proximity-Based Frameworks for Generating Embeddings from Multi-Output Data", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.34, no. 11, pp. 2216-2232, Nov. 2012, doi:10.1109/TPAMI.2012.20
REFERENCES
[1] L. Sun, A. Korhonen, and Y. Krymolowski, "Automatic Classification of English Verbs Using Rich Syntactic Features," Proc. Third Int'l Joint Conf. Natural Language Processing, pp. 769-774. 2008.
[2] J. Björne, J. Heimonen, F. Ginter, A. Airola, T. Pahikkala, and T. Salakoski, "Extracting Complex Biological Events with Rich Graph-Based Feature Sets," Proc. Workshop Current Trends in Biomedical Natural Language Processing: Shared Task, pp. 10-18. 2009.
[3] D.D. Lewis, "Feature Selection and Feature Extraction for Text Categorization," Proc. Workshop Speech and Natural Language, pp. 212-217, 1992.
[4] R. Bekkerman, N. Tishby, Y. Winter, I. Guyon, and A. Elisseeff, "Distributional Word Clusters vs. Words for Text Categorization," J. Machine Learning Research, vol. 3, pp. 1183-1208, 2003.
[5] I.S. Dhillon, S. Mallela, and R. Kumar, "A Division Information-Theoretic Feature Clustering Algorithm for Text Classification," J. Machine Learning Research, vol. 3, pp. 1265-1287, 2003.
[6] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, and R. Harshman, "Indexing by Latent Semantic Analysis," J. Am. Soc. for Information Science, vol. 41, pp. 391-407, 1990.
[7] H. Kim, P. Howland, and H. Parl, "Dimension Reduction in Text Classification with Support Vector Machines," J. Machine Learning Research, vol. 6, pp. 37-53, 2005.
[8] P.K. Chan, M.D.F. Schlag, and J.Y. Zien, "Spectral K-Way Ratio-Cut Partitioning and Clustering," IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 13, no. 9, pp. 1088-1096, Sept. 1994.
[9] F.R.K. Chung, Spectral Graph Theory, CBMS Regional Conf. Series in Math. Am. Math. Soc., 1997.
[10] S.T. Roweis and L.K. Saul, "Nonlinear Dimensionality Reduction by Locally Linear Embedding," Science, vol. 290, no. 5500, pp. 2323-2326, 2000.
[11] E. Kokiopoulou and Y. Saad, "Orthogonal Neighborhood Preserving Projections: A Projection-Based Dimensionality Reduction Technique," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 12, pp. 2143-2156, Dec. 2007.
[12] J. Shi and J. Malik, "Normalized Cuts and Image Segmentation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.
[13] A. Ng, M. Jordan, and Y. Weiss, "On Spectral Clustering: Analysis and Algorithm," Proc. Advances in Neural Information Processing Systems, 2001.
[14] U. Luxburg, "A Tutorial on Spectral Clustering," Statistics and Computing, vol. 17, no. 4, pp. 395-416, 2007.
[15] J.B. Tenenbaum, V. de Silva, and J.C. Langford, "A Global Geometric Framework for Nonlinear Dimensionality Reduction," Science, vol. 290, no. 5500, pp. 2319-2323, 2000.
[16] M. Belkin and P. Niyogi, "Laplacian Eigenmaps for Dimensionality Reduction and Data Representation," Neural Computation, vol. 15, no. 6, pp. 1373-1396, 2003.
[17] X. He and P. Niyogi, "Locality Preserving Projections," Proc. Neural Information Processing Systems, 2003.
[18] Y. Hou, P. Zhang, X. Xu, X. Zhang, and W. Li, "Nonlinear Dimensionality Reduction by Locally Linear Inlaying," IEEE Trans. Neural Networks, vol. 20, no. 2, pp. 300-315, Feb. 2009.
[19] T. Zhang, D. Tao, X. Li, and J. Yang, "Patch Alignment for Dimensionality Reduction," IEEE Trans. Knowledge and Data Eng., vol. 21, no. 9, pp. 1299-1313, Sept. 2009.
[20] R.A. Fisher, "The Use of Multiple Measurements in Taxonomic Problems," Annals of Eugenics, vol. 7, no. 2, pp. 179-188, 1936.
[21] H. Li, T. Jiang, and K. Zhang, "Efficient and Robust Feature Extraction by Maximum Margin Criterion," IEEE Trans. Neural Networks, vol. 17, no. 1, pp. 157-165, Jan. 2006.
[22] T. Hastie and R. Tibshirani, "Discriminant Adaptive Nearest Neighbor Classification," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 6, pp. 607-616, June 1996.
[23] S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, and S. Lin, "Graph Embedding and Extensions: A General Framework for Dimensionality Reduction," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 1, pp. 40-51, Jan. 2007.
[24] M. Sugiyama, "Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis," J. Machine Learning Research, vol. 8, pp. 1027-1061, 2007.
[25] M. Sugiyama, T. Idé, S. Nakajima, and J. Sese, "Semi-Supervised Local Fisher Discriminant Analysis for Dimensionality Reduction," Machine Learning, vol. 78, nos. 1/2, pp. 35-61, 2010.
[26] W. Zhang, X. Xue, Z. Sun, Y. Guo, and H. Lu, "Optimal Dimensionality of Metric Space for Classification," Proc. 24th Int'l Conf. Machine Learning, vol. 227, pp. 1135-1142, 2007.
[27] E. Kokiopouloua and Y. Saadb, "Enhanced Graph-Based Dimensionality Reduction with Repulsion Laplaceans," Pattern Recognition, vol. 42, pp. 2392-2402, 2009.
[28] S. Zhang, "Enhanced Supervised Locally Linear Embedding," Pattern Recognition Letters, vol. 30, no. 13, pp. 1208-1218, 2009.
[29] T. Zhang, K. Huang, X. Li, J. Yang, and D. Tao, "Discriminative Orthogonal Neighborhood-Preserving Projections for Classification," IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 40, no. 1, pp. 253-263, Feb. 2010.
[30] C. Chen, J. Zhang, and R. Fleischer, "Distance Approximating Dimension Reduction of Riemannian Manifolds," IEEE Trans. Systems, Man, and Cybernetics, Part B, Cybernetics, vol. 40, no. 1, pp. 208-217, Feb. 2010.
[31] Y. Song, F. Nie, C. Zhang, and S. Xiang, "A Unified Framework for Semi-Supervised Dimensionality Reduction," Pattern Recognition, vol. 41, no. 9, pp. 2789-2799, 2008.
[32] R. Chatpatanasiri and B. Kijsirikul, "A Unified Semi-Supervised Dimensionality Reduction Framework for Manifold Learning," Neurocomputing, vol. 73, nos. 10-12, pp. 1631-1640, 2010.
[33] X. Zhu, "Semi-Supervised Learning Literature Survey," Technical Report 1530, Dept. of Computer Sciences, Univ. of Wisconsin, 2005.
[34] Y. Zhang, A.C. Surendran, J.C. Platt, and M. Narasimhan, "Learning from Multitopic Web Documents for Contextual Advertisement," Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2008.
[35] Z. Barutcuoglu, R.E. Schapire, and O.G. Troyanskaya, "Hierarchical Multi-Label Prediction of Gene Function," Bioinformatics, vol. 22, no. 7, pp. 830-836, 2006.
[36] K. Yu, S. Yu, and V. Tresp, "Multi-Label Informed Latent Semantic Indexing," Proc. 28th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, 2005.
[37] S. Yu, K. Yu, V. Tresp, and H. Kriegel, "Multi-Output Regularized Feature Projection," IEEE Trans. Knowledge and Data Eng, vol. 18, no. 12, pp. 1600-1613, Dec. 2006.
[38] G. Chen, Y. Song, F. Wang, and C. Zhang, "Semi-Supervised Multi-Label Learning by Solving a Sylvester Equation," Proc. Eighth SIAM Conf. Data Mining, pp. 410-419, 2008.
[39] Z. Zha, T. Mei, J. Wang, Z. Wang, and X. Hua, "Graph-Based Semi-Supervised Learning with Multiple Labels," J. Visual Comm. and Image Representation, vol. 20, no. 2, pp. 97-103, 2009.
[40] L. Sun, S. Ji, and J. Ye, "Hypergraph Spectral Learning for Multi-Label Classification," Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 668-676, 2008.
[41] D.R. Hardoon, S.R. Szedmak, and J.R. Shawe-Taylor, "Canonical Correlation Analysis: An Overview with Application to Learning Methods," Neural Computation, vol. 16, no. 12, pp. 2639-2664, 2004.
[42] J. Arenas-García, K.B. Petersen, and L.K. Hansen, "Sparse Kernel Orthonormalized PLS for Feature Extraction in Large Data Sets" Proc. Conf. Neural Information Processing Systems, 2006.
[43] L. Sun, S. Ji, and J. Ye, "A Least Squares Formulation for a Class of Generalized Eigenvalue Problems in Machine Learning," Proc. Int'l Conf. Machine Learning, 2008.
[44] Y. Zhang and Z. Zhou, "Multi-Label Dimensionality Reduction via Dependence Maximization," Proc. 23rd Nat'l Conf. Artificial Intelligence, vol. 3, pp. 1503-1505, 2007.
[45] K.E. HildII, D. Erdogmus, K. Torkkola, and J.C. Principe, "Feature Extraction Using Information-Theoretic Learning," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 9, pp. 1385-1392, Sept. 2006.
[46] P. Daniusis and P. Vaitkus, "Supervised Feature Extraction Using Hilbert-Schmidt Norms," Proc. 10th Int'l Conf. Intelligent Data Eng. and Automated Learning, pp. 25-33, 2009.
[47] P. Rai and H. Daumé III, "Multi-Label Prediction via Sparse Infinite CCA," Proc. Conf. Neural Information Processing Systems, 2009.
[48] N.D. Lawrence, "Gaussian Process Models for Visualisation of High Dimensional Data," Proc. Advances in Neural Information Processing Systems, 2004.
[49] R. Urtasun and T. Darrell, "Discriminative Gaussian Process Latent Variable Model for Classification," Proc. 24th Int'l Conf. Machine Learning, 2007.
[50] N.D. Lawrence, "Spectral Dimensionality Reduction via Maximum Entropy" Proc. 14th Int'l Conf. Artificial Intelligence and Statistics, 2011.
[51] L. Maaten, "Discussions of Spectral Dimensionality Reduction via Maximum Entropy," Proc. 14th Int'l Conf. Artificial Intelligence and Statistics, 2011.
[52] I.T. Jolliffe, Principal Component Analysis. Springer-Verlag, 1986.
[53] W.S. Torgerson, "Multidimensional Scaling: I. Theory and Method," J. Psychometrika, vol. 17, no. 4, pp. 401-419, 1952.
[54] L. Zelnik-manor and P. Perona, "Self-Tuning Spectral Clustering," Proc. Advances in Neural Information Processing Systems, pp. 1601-1608. 2005.
[55] I.S. Dhillon, "Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning," Proc. Seventh ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 269-274, 2001.
[56] D. Zhao and L. Yang, "Incremental Isometric Embedding of High-Dimensional Data Using Connected Neighborhood Graphs," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 1, pp. 86-98, Jan. 2009.
[57] F. Bach and M. Jordan, "Kernel Independent Component Analysis," J. Machine Learning Research, vol. 3, pp. 1-48, 2002.
[58] H. Wold, "Partial Least Squares," Encyclopedia of the Statistical Sciences, S. Kotz and N.L. Johnson, eds., vol. 6, pp. 581-591, John Wiley and Sons, 1985.
[59] S. Roweis and C. Brody, "Linear Heteroencoders," technical report, Gatsby Computational Neuroscience Unit, Alexandra House, 1999.
[60] A. Gretton, O. Bousquet, E. Smola, and B. Schölkopf, "Measuring Statistical Dependence with Hilbert-Schmidt Norms," Proc. 16th Int'l Conf. Algorithmic Learning Theory, pp. 63-77. 2005,
[61] L. Song, A.J. Smola, A. Gretton, K.M. Borgwardt, and J. Bedo, "Supervised Feature Selection via Dependence Estimation," Proc. Int'l Conf. Machine Learning, pp. 823-830. 2007.
[62] D. Zhou, J. Huang, and B. Schlkopf, "Learning with Hypergraphs: Clustering, Classification, and Embedding," Proc. Advances in Neural Information Processing Systems, 2007.
[63] X. He, S. Yan, Y. Hu, P. Niyogi, and H. Zhang, "Face Recognition Using Laplacianfaces," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 328-340, Mar. 2005.
[64] Y. Bengio, O. Delalleau, N.L. Roux, J.F. Paiement, P. Vincent, and M. Ouimet, Feature Extraction, Foundations and Applications. Springer, 2006.
[65] E. Kokiopoulou, J. Chen, and Y. Saad, "Trace Optimization and Eigenproblems in Dimension Reduction Methods," Numerical Linear Algebra with Applications, vol. 18, pp. 565-602, 2010.
[66] Z. Zhang and H. Zha, "Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment," SIAM J. Scientific Computing, vol. 26, no. 1, pp. 313-338, 2005.
[67] M.R. Boutell, J. Luo, X. Shen, and C.M. Brown, "Learning Multi-Label Scene Classification," Pattern Recognition, vol. 37, no. 9, pp. 1757-1771, 2004.
[68] W. Chen, J. Yan, B. Zhang, Z. Chen, and Q. Yang, "Document Transformation for Multi-Label Feature Selection in Text Categorization," Proc. IEEE Int'l Conf. Data Mining, pp. 451-456. 2007,
[69] G. Tsoumakas, I. Katakis, and I. Vlahavas, "Mining Multi-Label Data," unpublished book chapter, 2009.
[70] P.N. Bennett and N. Nguyen, "Refined Experts: Improving Classification in Large Taxonomies," Proc. 32nd Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, 2009.
[71] J.Y. Goulermas, A.H. Findlow, C.J. Nester, P. Liatsis, X.J. Zeng, L.P.J. Kenney, P. Tresadern, S.B. Thies, and D. Howard, "An Instance-Based Algorithm with Auxiliary Similarity Information for the Estimation of Gait Kinematics from Wearable Sensors," IEEE Trans. Neural Networks, vol. 19, no. 9, pp. 1574-1582, Sept. 2008.
[72] B. Scholköpf, A. Smola, and K.R. Müller, "Nonlinear Component Analysis as a Kernel Eigenvalue Problem," Neural Computation, vol. 10, no. 5, pp. 1299-1319, 1998.
[73] J. Mercer, "Functions of Positive and Negative Type and Their Connection with the Theory of Integral Equations," Philosophical Trans. Royal Soc. of London, vol. 7, pp. 415-446, 1909.
[74] B. Schölkopf, C.J.C. Burges, and A.J. Smola, Advances in Kernel Methods: Support Vector Learning. MIT Press, 1999.
[75] G.W. Stewart, Matrix Algorithms: Eigensystems, vol. 2. SIAM, 2001.
[76] E. Pekalska and R. Duin, "Dissimilarity Representations Allow for Building Good Classifiers," Pattern Recognition Letters, vol. 42, no. 8, pp. 943-956, 2002.
[77] E. Pekalska, R. Duin, and P. Paclik, "Prototype Selection for Dissimilarity-Based Classifiers," Pattern Recognition, vol. 39, no. 2, pp. 189-208, 2006.
[78] J.C. Bezdek, R.J. Hathaway, and J.M. Huband, "Visual Assessment of Clustering Tendency for Rectangular Dissimilarity Matrices," IEEE Trans. Fuzzy Systems, vol. 15, no. 5, pp. 890-903, Oct. 2007.
24 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool