loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI'06)
Analyzing the Effect of Document Representation on Machine Learning Approaches in Multi-Class e-Mail Filtering
Hong Kong, China
December 18-December 22
ISBN: 0-7695-2747-7
Helmut Berger, EC3, Austria
Michael Dittenbach, EC3, Austria
Dieter Merkl, Technische Universitat Wien, Austria
This paper reports on experiments in multi-class document categorization with supervised machine learning techniques. The document collection consists of of a set of personal e-mail messages. Two distinct document representation formalisms are employed to characterize these messages, namely a standard word-based approach and a character n-gram document representation. Based on these document representations, the categorization performance of five machine learning approaches is assessed and a comparison is given. In principle, both document representation yielded comparable results with the various classifiers. However, the results for the n-gram-based document representation were definitely better in case of an aggressive feature selection strategy.
Citation:
Helmut Berger, Michael Dittenbach, Dieter Merkl, "Analyzing the Effect of Document Representation on Machine Learning Approaches in Multi-Class e-Mail Filtering," wi, pp.297-300, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.