This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Proliferation and Detection of Blog Spam
September/October 2010 (vol. 8 no. 5)
pp. 42-47
Saeed Abu-Nimeh, Websense, San Diego
Thomas Chen, Swansea University, Swansea
The ease of posting comments and links in blogs has attracted spammers as an alternative venue to conventional email. An experimental study investigates the nature and prevalence of blog spam. Using Defensio logs, the authors collected and analyzed more than one million blog comments during the last two weeks of June 2009. They used a support vector machine (SVM) classifier combined with heuristics to identify spam posters' IP addresses, autonomous system numbers (ASN), and IP blocks. Experimental results show that more than 75 percent of blog comments during the reporting period are spam. In addition, the results show that blog spammers likely operate from a few colocation facilities.

1. D. Fetterly, M. Manasse, and M. Najork, "Spam, Damn Spam, and Statistics," Proc. 7th ACM Int'l Workshop Web and Databases, ACM Press, 2004, pp. 1–6.
2. M. Ramilli and M. Prandini, "Comment Spam Injection Made Easy," Proc. 6th IEEE Consumer Comm. and Networking Conf., IEEE Press, 2009, pp. 1–5.
3. Y-R. Lin et al., "Detecting Splogs via Temporal Dynamics Using Self-Similarity Analysis," ACM Trans. Web, vol. 2, no. 1, 2008, pp. 1–35.
4. "Security Threat Report: 2009," white paper, Sophos, Jan. 2009; www.sophos.com/sophos/docs/eng/marketing_material sophos-security-threat-report-jan-2009-na.pdf .
5. A. Bhattarai, V. Rus, and D. Dasgupta, "Characterizing Comment Spam in the Blogosphere through Content Analysis," Proc. IEEE Symp. Computational Intelligence in Cyber Security, IEEE Press, 2009, pp. 37–44.
6. A. Ntoulas et al., "Detecting Spam Web Pages through Content Analysis," Proc. 15th ACM Int'l Conf. World Wide Web, ACM Press, 2006, pp. 83–92.
7. E. Alpaydin, Introduction to Machine Learning, MIT Press, 2004.
8. "Reputation-Based Mail Flow Control," white paper, IronPort, 2002; www.ironport.com/pdfironport_reputation_based_control_whitepaper.pdf .
1. M. Ramilli and M. Prandini, "Comment Spam Injection Made Easy," Proc. 6th IEEE Consumer Comm. and Networking Conf., IEEE Press, 2009, pp. 1–5.
2. N. Dai, B. Davison, and X. Qi, "Looking into the Past to Better Classify Web Spam," Proc. 5th ACM Int'l Workshop Adversarial Information Retrieval on the Web (AIRWeb 09), ACM Press, 2009, pp. 1–8.
3. Y-R. Lin et al., "Detecting Splogs via Temporal Dynamics Using Self-Similarity Analysis," ACM Trans. Web, vol. 2, no. 1, 2008, pp. 1–35.
4. D. Fetterly, M. Manasse, and M. Najork, "Spam, Damn Spam, and Statistics," Proc. 7th ACM Int'l Workshop Web and Databases, ACM Press, 2004, pp. 1–6.
5. A. Ntoulas et al., "Detecting Spam Web Pages through Content Analysis," Proc. 15th ACM Int'l Conf. World Wide Web, ACM Press, 2006, pp. 83–92.
6. L. Zhang, J. Zhu, and T. Yao, "An Evaluation of Statistical Spam Filtering Techniques," ACM Trans. Asian Language Information Processing, vol. 3, no. 4, 2004, pp. 243–269.
7. D. Meyer, F. Leish, and K. Hornik, "The Support Vector Machine under Test," Neurocomputing, vol. 55, 2003, pp. 169–186.
8. P. Kolari, A. Java, and T. Finin, "Characterizing the Splogosphere," Proc. 3rd Ann. Workshop Weblogging Ecosystem: Aggregation, Analysis and Dynamics (WWW 06), Univ. Maryland, 2006; http://ebiquity.umbc.edu/paper/html/id/299 Characterizing-the-Splogosphere.
9. P. Kolari, T. Finin, and A. Joshi, "SVMs for the Blogosphere: Blog Identification and Splog Detection," Proc. AAAI Spring Symp. Computational Approaches to Analyzing Weblogs, Am. Assoc. Artificial Intelligence, 2006, pp. 92–99.
10. T. Katayama et al., "An Empirical Study on Selective Sampling in Active Learning for Splog Detection," Proc. 5th ACM Int'l Workshop Adversarial Information Retrieval on the Web (AIRWeb 09), ACM Press, 2009, pp. 29–36.
11. D. Sculley and G. Wachman, "Relaxed Online SVMs for Spam Filtering," Proc. 30th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR 07), ACM Press, 2007, pp. 415–422.

Index Terms:
network-level security and protection; Web browser.
Citation:
Saeed Abu-Nimeh, Thomas Chen, "Proliferation and Detection of Blog Spam," IEEE Security & Privacy, vol. 8, no. 5, pp. 42-47, Sept.-Oct. 2010, doi:10.1109/MSP.2010.113
Usage of this product signifies your acceptance of the Terms of Use.