2012 Conference on Technologies and Applications of Artificial Intelligence (TAAI) (2012)
Nov. 16, 2012 to Nov. 18, 2012
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TAAI.2012.16
Text classification is an important research topic for managing numerous electronic documents. Feature reduction is the key issue for text classification with high dimensional keywords. A document analysis method called discriminant coefficient was proposed to reduce features and achieve high precision text classification. However, the main problem of the discriminant based feature reduction method is that the final number of reduced features is exactly equal to the number of document classes. Although the precisions of classification are high in such a method, the recalls are relatively low. In this paper, we propose an improvement on the analyzing method indiscriminant coefficients. We apply a simple clustering method to distinguish the documents in each document class to reserve hidden differences among keywords in the same class. The clustering results can help to adjust the number of reduction features flexibly. The experimental results show that the proposed clustering mechanism supports adaptive features reduction and both of the recall and F1 measurements are improved.
feature extraction, pattern classification, pattern clustering, statistical analysis, text analysis
L. Gao and B. Chien, "Feature Reduction for Text Categorization Using Cluster-Based Discriminant Coefficient," 2012 Conference on Technologies and Applications of Artificial Intelligence(TAAI), Tainan, Taiwan Taiwan, 2013, pp. 137-142.