The Community for Technology Leaders
2015 IEEE International Conference on Information Reuse and Integration (IRI) (2015)
San Francisco, CA, USA
Aug. 13, 2015 to Aug. 15, 2015
ISBN: 978-1-4673-6656-4
pp: 403-410
ABSTRACT
Social websites, like Twitter and Facebook, strive to detect and remove URL spam in order to keep their users happy and coming back. Although researchers have already proposed many filtering approaches such as SpamRank and TrustRank, most of which detect URL spam using content analysis on the Web pages behind or link analysis on Web graph, it is challenging to automatically detect URL spam in social media as spammers keep evolving and advancing their techniques, such as cloaking based on the IP addresses, using multiple user accounts and redirectors. In this paper, we introduce BEAN, a behavior analysis technique, which detects URL spam by capturing the anomalous message sending behaviors of spammers. Twitter is an ideal place for our analysis due to its popularity and real-time properties. We collect over 2.4 million tweets from around a million users based on Twitter trending topics for 4 months. We apply our behavior analysis approach derived from a Markov Chain model to the Twitter dataset, and achieve a precision of 0.91 and recall of 0.88. In doing so we detected a lot of URL spam that cannot be filtered out by conventional approaches such as SVM and TrustRank, indicating that our approach is a good complement to existing URL spam detection techniques. Also, we further investigate anomalous behavior patterns of spammers in spreading URL spam to confirm our assumption.
INDEX TERMS
Uniform resource locators, Twitter, Market research, Measurement, Markov processes, Unsolicited electronic mail, Yttrium
CITATION

D. Wang and C. Pu, "BEAN: A BEhavior ANalysis Approach of URL Spam Filtering in Twitter," 2015 IEEE International Conference on Information Reuse and Integration (IRI), San Francisco, CA, USA, 2015, pp. 403-410.
doi:10.1109/IRI.2015.69
89 ms
(Ver 3.3 (11022016))