loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007)
Learning Term Dependency Links Using Information Theoretic Inclusion Measure
Omaha, Nebraska, USA
October 28-October 31
ISBN: 0-7695-3033-8
An algorithm to identify and remove term redundancy is proposed for text classifiers using ranking-based feature selection. The proposed method employs a normalized mu- tual information, which is called inclusion measure, to es- timate asymmetric dependency between two terms. Based on pair-wise dependency measures, a dependency matrix is constructed. In this paper, an algorithm is proposed to learn term dependency links from term dependency matrix, and visualize the dependency between term in a graph called term dependency tree. All nodes of the tree are categorized into two groups: hubs and links. Any node whose outde- gree is less than two will join the Links group. We show that all link nodes are most likely redundant. We also in- troduce a criterion, which is called substitution cost, to de- cide whether to remove or retain a candidate, redundant term. The proposed approach is applied to four well-known benchmark data sets with a SVM and Rocchio classifier us- ing a set of highly aggressive feature selection schemes. The results show the effectiveness of the proposed method espe- cially when applied to weak classifiers.
Citation:
Masoud Makrehchi, Mohamed S. Kamel, "Learning Term Dependency Links Using Information Theoretic Inclusion Measure," icdmw, pp.423-428, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007), 2007
Usage of this product signifies your acceptance of the Terms of Use.