Third IEEE International Conference on Data Mining (ICDM'03)
A Feature Selection Framework for Text Filtering
Melbourne, Florida
November 19-November 22
ISBN: 0-7695-1978-4
This paper presents a new framework for local feature selection in text filtering. In this framework, a feature set is constructed per category by first selecting a set of terms highly indicative of membership (positive set) and another set of terms highly indicative of non-membership (negative set), and then combining these two sets. This feature selection framework not only unifies several standard feature selection methods, but also facilitates the proposal of a new method that optimally combines the positive and negative sets. The experimental comparison between the proposed method and standard methods was conducted on six feature selection metrics: chi-square, correlation coefficient, odds ratio, GSS coefficient and two proposed variants of odds ratio and GSS coefficient: OR-square and GSS-square respectively. The results show that the proposed feature selection method improves text filtering performance.
Citation:
Zhaohui Zheng, Rohini Srihari, Sargur Srihari, "A Feature Selection Framework for Text Filtering," icdm, pp.705, Third IEEE International Conference on Data Mining (ICDM'03), 2003