21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07) Filtering Spam Using Kolmogorov Complexity Estimates Niagara Falls, Ontario, Canada May 21-May 23 ISBN: 0-7695-2847-3
This paper introduces an adaptive filter which filters spam email based on Kolmogorov complexity estimates. The complexity filter is first trained exactly like a Bayesian filter. Each email is mapped to a string representation in which the tokens or words are represented by either 0 or 1. Tokens associated with spam are represented by 1 whereas those associated with non-spam, or ham, are represented by 0. Common tokens are ignored. The Kolmogorov complexity of this string representation is estimated using run-length compression. If the resulting Kolmogorov complexity is low then the email is classified as spam. Otherwise the email is classified as ham. The complexity filter can filter messages almost twice as fast as a comparable Bayesian filter and achieve accuracy rates of 80% to 96%. While a Bayesian filter views an email as a "bag of words", the complexity filter uses token distribution information and is likely less vulnerable to statistical attack.
Citation:
L.M. Spracklin, L.V. Saxton, "Filtering Spam Using Kolmogorov Complexity Estimates," ainaw, vol. 1, pp.321-328, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07), 2007 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||