The Community for Technology Leaders
RSS Icon
Issue No.05 - May (2009 vol.20)
pp: 725-739
Kang Li , University of Georgia, Athens
Zhenyu Zhong , Secure Computing, Alpharetta
Lakshmish Ramaswamy , University of Georgia, Athens
While the concept of collaboration provides a natural defense against massive spam e-mails directed at large numbers of recipients, designing effective collaborative anti-spam systems raises several important research challenges. First and foremost, since e-mails may contain confidential information, any collaborative anti-spam approach has to guarantee strong privacy protection to the participating entities. Second, the continuously evolving nature of spam demands the collaborative techniques to be resilient to various kinds of camouflage attacks. Third, the collaboration has to be lightweight, efficient, and scalable. Toward addressing these challenges, this paper presents ALPACAS—a privacy-aware framework for collaborative spam filtering. In designing the ALPACAS framework, we make two unique contributions. The first is a feature-preserving message transformation technique that is highly resilient against the latest kinds of spam attacks. The second is a privacy-preserving protocol that provides enhanced privacy guarantees to the participating entities. Our experimental results conducted on a real e-mail data set shows that the proposed framework provides a 10 fold improvement in the false negative rate over the Bayesian-based Bogofilter when faced with one of the recent kinds of spam attacks. Further, the privacy breaches are extremely rare. This demonstrates the strong privacy protection provided by the ALPACAS system.
Distributed systems, collaboration, spam, privacy.
Kang Li, Zhenyu Zhong, Lakshmish Ramaswamy, "Privacy-Aware Collaborative Spam Filtering", IEEE Transactions on Parallel & Distributed Systems, vol.20, no. 5, pp. 725-739, May 2009, doi:10.1109/TPDS.2008.143
[1] Z. Zhong, L. Ramaswamy, and K. Li, “Alpacas: A Large-Scale Privacy-Aware Collaborative Anti-Spam System,” Proc. IEEE INFOCOM '08, Apr. 2008.
[2] V. Schryver, Distributed Checksum Clearinghouse,, Nov. 2005.
[3] Vipul's Razor Anti-Spam System, Vipul Ved Prakash, http:/, 2008.
[4] E.S. Raymond, Bogofilter: A Fast Open Source Bayesian Spam Filters, http:/, Nov. 2005.
[5] J. Jung and E. Sit, “An Empirical Study of Spam Traffic and the Use of DNS Black Lists,” Proc. Internet Measurement Conf. (IMC '04), Oct. 2004.
[6] T. Meyer and B. Whateley, “SpamBayes: Effective Open-Source, Bayesian Based, Email Classifications,” Proc. First Email and Anti-SPAM Conf. (CEAS '04), July. 2004.
[7] A. Ramachandran and N. Feamster, “Understanding the Network-Level Behavior of Spammers,” Proc. ACM SIGCOMM'06, Sept. 2006.
[8] B. Leiba, J. Ossher, V.T. Rajan, R. Segal, and M. Wegman, “SMTP Path Analysis,” Proc. Second Email and Anti-SPAM Conf. (CEAS '05), July 2005.
[9] M.W. Wang, Sender Authentication: What to Do, Anti-Abuse Working Group White Paper, http://spf.pobox.comwhitepaper.pdf, 2008.
[10] M. Sergeant, “Internet Level Spam Detection and Spamassassin,” Proc. Spam Conf., Jan. 2003.
[11] B. Klimt and Y. Yang, “Introducing the Enron Corpus,” Proc. First Email and Anti-SPAM Conf. (CEAS '04), July 2004.
[12] G.L. Wittel and S.F. Wu, “On Attacking Statistical Spam Filters,” Proc. First Email and Anti-SPAM Conf. (CEAS '04), July 2004.
[13] D. Lowd and C. Meek, “Good Word Attacks on Statistical SpamFilter,” Proc. Second Email and Anti-SPAM Conf. (CEAS '05), July 2005.
[14] G.V. Cormark and T. Lynam, “Spam Corpus Creation for TREC,” Proc. Second Email and Anti-SPAM Conf. (CEAS), 2005.
[15] “The Spam Track,” Proc. 15th Text Retrieval Conf. (TREC '06), Nat'l Inst. of Standards and Technology, http:/, 2006.
[16] S. Webb, S. Chitti, and C. Pu, “An Experimental Evaluation of Spam Filter Performance and Robustness against Attack,” Proc. First Int'l Conf. Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom '05), Dec. 2005.
[17] Cloudmark, Spamnet Anti-Spam System, http://www.cloudmark. comdesktop, 2008.
[18] A. Gray and M. Haahr, “Personalised, Collaborative Spam Filtering,” Proc. Second Email and Anti-SPAM Conf. (CEAS), 2005.
[19] E. Damiani, S.D.C. di Vimercati, S. Paraboschi, and P. Samarati, “P2P-Based Collaborative Spam Detection and Filtering,” Proc.Fourth Int'l Conf. Peer-to-Peer Computing (P2P '04), http://, Aug. 2004.
[20] F. Zhou and L. Zhuang, SpamWatch a Peer-to-Peer Spam Filtering System,, 2003.
[21] J.S. Kong, B.A. Rezaei, N. Sarshar, V.P. Roychowdhury, and P. Boykin, “Collaborative SPAM Filter Using E-Mail Networks,” Computer, Aug. 2006.
[22] H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar, “Random-Data Perturbation Techniques and Privacy-Preserving Data Mining,” Knowledge Information Systems, 2005.
[23] A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam, “L-Diversity: Privacy beyond K-Anonymity,” Proc. 22nd IEEE Int'l Conf. Data Eng. (ICDE '06), Apr. 2006.
[24] L. Swweney, “K-Anonymity: A Model for Protecting Privacy,” Int'l J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, 2002.
[25] E. Bertino, B.C. Ooi, Y. Yang, and R.H. Deng, “Privacy and Ownership Preserving of Outsourced Medical Data,” Proc. 21st Int'l Conf. Data Eng. (ICDE), 2005.
[26] K. Aberer and Z. Despotovic, “Managing Trust in a Peer-2-Peer Information System,” Proc. 16th ACM Conf. Information and Knowledge Management (CIKM), 2001.
[27] R.J. Bayardo and R. Agrawal, “Data Privacy through Optimal K-Anonymization,” Proc. 21st IEEE Int'l Conf. Data Eng. (ICDE'05), Apr. 2005.
[28] N. Zhang and W. Zhao, “Distributed Privacy Preserving Information Sharing,” Proc. 31st Int'l Conf. Very Large Data Bases (VLDB), 2005.
[29] R. Agrawal and R. Srikant, “Privacy-Preserving Data Mining,” Proc. ACM SIGMOD, 2000.
[30] J. Vaidya and C. Clifton, “Privacy-Preserving Top-K Queries,” Proc. 21st Int'l Conf. Data Eng. (ICDE), 2005.
[31] A. Broder, “Some Applications of Rabins Fingerprinting Method,” Sequences II: Methods in Communications, Security, and Computer Science, pp. 143-152, Springer-Verlag, 1993.
[32] Z. Bar-Yossef and S. Rajagopalan, “Template Detection via Data Mining and Its Applications,” Proc. 11th Int'l World Wide Web Conf. (WWW '02), May 2002.
[33] L. Ramaswamy, A. Iyengar, L. Liu, and F. Douglis, “Automatic Detection of Fragments in Dynamically Generated Web Pages,” Proc. 13th World Wide Web Conf. (WWW '04), May 2004.
[34] M.O. Rabin, “Fingerprinting by Random Polynomials,” technical report, Center for Research in Computing Technology, Harvard Univ., 1981.
[35] R. Tewari, M. Dahlin, H. Vin, and J. Kay, “Beyond Hierarchies: Design Considerations for Distributed Caching on the Internet,” Proc. 19th Int'l Conf. Distributed Computing Systems (ICDCS '99), May 1999.
[36] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, and H. Balakrishnan, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications,” Proc. ACM SIGCOMM '01, Aug. 2001.
[37] L. Ramaswamy, L. Liu, and A. Iyengar, “Cache Clouds: Cooperative Caching of Dynamic Documents in Edge Networks,” Proc. 25th Int'l Conf. Distributed Computing Systems (ICDCS '05), June 2005.
[38] K. Li and Z. Zhong, “Fast Statistical Spam Filter by Approximate Classifications,” Proc. ACM SIGMETRICS/IFIP Performance, 2006.
48 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool