loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fourth IEEE International Conference on Data Mining (ICDM'04)
Supervised Latent Semantic Indexing for Document Categorization
Brighton, United Kingdom
November 01-November 04
ISBN: 0-7695-2142-8
Jian-Tao Sun, TsingHua University, Beijing, P.R. China
Zheng Chen, Microsoft Research Asia, P.R. China
Hua-Jun Zeng, Microsoft Research Asia, P.R. China
Yu-Chang Lu, TsingHua University, Beijing, P.R. China
Chun-Yi Shi, TsingHua University, Beijing, P.R. China
Wei-Ying Ma, Microsoft Research Asia, P.R. China
Latent Semantic Indexing (LSI) is a successful technology in information retrieval (IR) which attempts to explore the latent semantics implied by a query or a document through representing them in a dimension-reduced space. However, LSI is not optimal for document categorization tasks because it aims to find the most representative features for document representation rather than the most discriminative ones. In this paper, we propose Supervised LSI (SLSI) which selects the most discriminative basis vectors using the training data iteratively. The extracted vectors are then used to project the documents into a reduced dimensional space for better classification. Experimental evaluations show that the SLSI approach leads to dramatic dimension reduction while achieving good classification results.
Citation:
Jian-Tao Sun, Zheng Chen, Hua-Jun Zeng, Yu-Chang Lu, Chun-Yi Shi, Wei-Ying Ma, "Supervised Latent Semantic Indexing for Document Categorization," icdm, pp.535-538, Fourth IEEE International Conference on Data Mining (ICDM'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.