loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
International Conference on Semantic Computing (ICSC 2007)
CLDA: Feature Selection for Text Categorization Based on Constrained LDA
Irvine, California
September 17-September 19
ISBN: 0-7695-2997-6
Cui Zifeng, Southeast University, China
Xu Baowen, Southeast University, China
Zhang Weifeng, Nanjing University of Posts and Telecommunications, China
Jiang Dawei, Southeast University, China
Xu Junling, Southeast University, China
Feature selection is a necessary process before pattern classification, machine learning and data mining. Now feature selection is facing challenge in high dimension space, such as text categorization in information retrieval. Linear Discriminant Analysis (LDA) is an excellent dimensionality reduction method which transforms the original data into low-dimensional feature space. However, it changes the original physical features and makes features uninterpretable, which motivates us to select but not transform features by LDA idea of preserving structure information of between-class and within-class for text categorization. In the paper, a new approach of feature selection based on Constrained LDA (CLDA) is proposed, which models feature selection as a search problem in subspace and finds optimal solution subject to some restrictions. Further, CLDA optimization problem is transformed into a process of scoring and sorting of features. Experiments on 20 Newsgroups and Reuters-21578 show that CLDA is consistently better than information gain and chi2-test with lower computational complexity.
Citation:
Cui Zifeng, Xu Baowen, Zhang Weifeng, Jiang Dawei, Xu Junling, "CLDA: Feature Selection for Text Categorization Based on Constrained LDA," icsc, pp.702-712, International Conference on Semantic Computing (ICSC 2007), 2007
Usage of this product signifies your acceptance of the Terms of Use.