loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07)
Filtering Spam Using Kolmogorov Complexity Estimates
Niagara Falls, Ontario, Canada
May 21-May 23
ISBN: 0-7695-2847-3
L.M. Spracklin, University of Ottawa, Canada
L.V. Saxton, University of Regina, Canada
This paper introduces an adaptive filter which filters spam email based on Kolmogorov complexity estimates. The complexity filter is first trained exactly like a Bayesian filter. Each email is mapped to a string representation in which the tokens or words are represented by either 0 or 1. Tokens associated with spam are represented by 1 whereas those associated with non-spam, or ham, are represented by 0. Common tokens are ignored.

The Kolmogorov complexity of this string representation is estimated using run-length compression. If the resulting Kolmogorov complexity is low then the email is classified as spam. Otherwise the email is classified as ham.

The complexity filter can filter messages almost twice as fast as a comparable Bayesian filter and achieve accuracy rates of 80% to 96%. While a Bayesian filter views an email as a "bag of words", the complexity filter uses token distribution information and is likely less vulnerable to statistical attack.

Citation:
L.M. Spracklin, L.V. Saxton, "Filtering Spam Using Kolmogorov Complexity Estimates," ainaw, vol. 1, pp.321-328, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07), 2007
Usage of this product signifies your acceptance of the Terms of Use.