loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fifth IEEE International Conference on Data Mining (ICDM'05)
Bias Analysis in Text Classification for Highly Skewed Data
Houston, Texas
November 27-November 30
ISBN: 0-7695-2278-5
Lei Tang, Arizona State University
Huan Liu, Arizona State University
Feature selection is often applied to high-dimensional data as a preprocessing step in text classification. When dealing with highly skewed data, we observe that typical feature selection metrics like information gain or chi-squared are biased toward selecting features for the minor class, and the metric of bi-normal separation can select features for both minor and major classes. In this work, we investigate how these feature selection metrics impact on the performance of frequently used classifiers such as Decision Trees, Na??ve Bayes, and Support Vector Machines via bias analysis for highly skewed data. Three types of biases are metric bias, class bias, and classifier bias. Extensive experiments are designed to understand how these biases can be employed in concert and efficiently to achieve good classification performance. We report our findings and present recommended approaches to text classification based on bias analysis and the empirical study.
Citation:
Lei Tang, Huan Liu, "Bias Analysis in Text Classification for Highly Skewed Data," icdm, pp.781-784, Fifth IEEE International Conference on Data Mining (ICDM'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.