Seventh IEEE International Conference on Data Mining (ICDM 2007) (2007)
Omaha, Nebraska, USA
Oct. 28, 2007 to Oct. 31, 2007
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2007.18
Feature selection on multi-label documents for automatic text categorization is an under-explored research area. This paper presents a systematic document transformation framework, whereby the multi-label documents are transformed into single-label documents before applying standard feature selection algorithms, to solve the multi-label feature selection problem. Under this framework, we undertake a comparative study on four intuitive document transformation approaches and propose a novel approach called Entropy-based Label Assignment (ELA), which assigns the labels weights to a multi-label document based on label entropy. Three standard feature selection algorithms are utilized for evaluating the document transformation approaches in order to verify its impact on multi-class text categorization problems. Using a SVM classifier and two multi-label evaluation benchmark text collections, we show that the choice of document transformation approaches can significantly influence the performance of multi-class categorization and that our proposed document transformation approach ELA can achieve better performance than all other approaches.
W. Chen, Z. Chen, Q. Yang, B. Zhang and J. Yan, "Document Transformation for Multi-label Feature Selection in Text Categorization," Seventh IEEE International Conference on Data Mining (ICDM 2007)(ICDM), Omaha, Nebraska, USA, 2007, pp. 451-456.