The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - Nov.-Dec. (2012 vol.9)
pp: 811-824
Zi Chu , Twitter Inc., San Francisco
Steven Gianvecchio , MITRE Corporation, McLean
Haining Wang , College of William and Mary, Williamsburg
Sushil Jajodia , George Mason University, Fairfax
ABSTRACT
Twitter is a new web application playing dual roles of online social networking and microblogging. Users communicate with each other by publishing text-based posts. The popularity and open structure of Twitter have attracted a large number of automated programs, known as bots, which appear to be a double-edged sword to Twitter. Legitimate bots generate a large amount of benign tweets delivering news and updating feeds, while malicious bots spread spam or malicious contents. More interestingly, in the middle between human and bot, there has emerged cyborg referred to either bot-assisted human or human-assisted bot. To assist human users in identifying who they are interacting with, this paper focuses on the classification of human, bot, and cyborg accounts on Twitter. We first conduct a set of large-scale measurements with a collection of over 500,000 accounts. We observe the difference among human, bot, and cyborg in terms of tweeting behavior, tweet content, and account properties. Based on the measurement results, we propose a classification system that includes the following four parts: 1) an entropy-based component, 2) a spam detection component, 3) an account properties component, and 4) a decision maker. It uses the combination of features extracted from an unknown user to determine the likelihood of being a human, bot, or cyborg. Our experimental evaluation demonstrates the efficacy of the proposed classification system.
INDEX TERMS
Twitter, Social network services, Identification, Electronic mail, Blogs, social networks, Automatic identification, bot, cyborg, Twitter
CITATION
Zi Chu, Steven Gianvecchio, Haining Wang, Sushil Jajodia, "Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg?", IEEE Transactions on Dependable and Secure Computing, vol.9, no. 6, pp. 811-824, Nov.-Dec. 2012, doi:10.1109/TDSC.2012.75
REFERENCES
[1] "Top Trending Twitter Topics for 2011 from What the Trend," http://blog.hootsuite.comtop-twitter-trends-2011 /, Dec. 2011.
[2] "Twitter Blog: Your World, More Connected," http://blog. twitter.com/2011/08your-world-more-connected.html , Aug. 2011.
[3] Alexa, "The Top 500 Sites on the Web by Alexa," http://www.alexa.comtopsites, Dec. 2011.
[4] "Amazon Comes to Twitter," http://www.readwriteweb.com/archivesamazon_comes_to_twitter.php , Dec. 2009.
[5] "Best Buy Goes All Twitter Crazy with @Twelpforce," http://twitter.com/in_social_media/status 2756927865, Dec. 2009.
[6] "Barack Obama Uses Twitter in 2008 Presidential Campaign," http://twitter.comBarackObama/, Dec. 2009.
[7] J. Sutton, L. Palen, and I. Shlovski, "Back-Channels on the Front Lines: Emerging Use of Social Media in the 2007 Southern California Wildfires," Proc. Int'l ISCRAM Conf., May 2008.
[8] A.L. Hughes and L. Palen, "Twitter Adoption and Use in Mass Convergence and Emergency Events," Proc. Sixth Int'l ISCRAM Conf., May 2009.
[9] S. Gianvecchio, M. Xie, Z. Wu, and H. Wang, "Measurement and Classification of Humans and Bots in Internet Chat," Proc. 17th USENIX Security Symp., 2008.
[10] B. Stone-Gross, M. Cova, L. Cavallaro, B. Gilbert, M. Szydlowski, R. Kemmerer, C. Kruegel, and G. Vigna, "Your Botnet Is My Botnet: Analysis of a Botnet Takeover," Proc. 16th ACM Conf. Computer and Comm. Security, 2009.
[11] S. Gianvecchio, Z. Wu, M. Xie, and H. Wang, "Battle of Botcraft: Fighting Bots in Online Games with Human Observational Proofs," Proc. 16th ACM Conf. Computer and Comm. Security, 2009.
[12] A. Java, X. Song, T. Finin, and B. Tseng, "Why We Twitter: Understanding Microblogging Usage and Communities," Proc. Ninth WebKDD and First SNA-KDD Workshop Web Mining and Social Network Analysis, 2007.
[13] B. Krishnamurthy, P. Gill, and M. Arlitt, "A Few Chirps about Twitter," Proc. First Workshop Online Social Networks, 2008.
[14] S. Yardi, D. Romero, G. Schoenebeck, and D. Boyd, "Detecting Spam in a Twitter Network," First Monday, vol. 15, no. 1, Jan. 2010.
[15] A. Mislove, M. Marcon, K.P. Gummadi, P. Druschel, and B. Bhattacharjee, "Measurement and Analysis of Online Social Networks," Proc. Seventh ACM SIGCOMM Conf. Internet Measurement, 2007.
[16] S. Wu, J.M. Hofman, W.A. Mason, and D.J. Watts, "Who Says What to Whom on Twitter," Proc. 20th Int'l Conf. World Wide Web, pp. 705-714, 2011.
[17] H. Kwak, C. Lee, H. Park, and S. Moon, "What Is Twitter, a Social Network or a News Media?" Proc. 19th Int'l Conf. World Wide Web, pp. 591-600, 2010.
[18] I.-C.M. Dongwoo Kim, Y. Jo, and A. Oh, "Analysis of Twitter Lists as a Potential Source for Discovering Latent Characteristics of Users," Proc. CHI Workshop Microblogging: What and How Can We Learn From It?, 2010.
[19] D. Zhao and M.B. Rosson, "How and Why People Twitter: The Role that Micro-Blogging Plays in Informal Communication at Work," Proc. ACM Int'l Conf. Supporting Group Work, 2009.
[20] K. Starbird, L. Palen, A. Hughes, and S. Vieweg, "Chatter on the Red: What Hazards Threat Reveals about the Social Life of Microblogged Information," Proc. ACM Conf. Computer Supported Cooperative Work, Feb. 2010.
[21] B.J. Jansen, M. Zhang, K. Sobel, and A. Chowdury, "Twitter Power: Tweets as Electronic Word of Mouth," Am. Soc. for Information Science and Technology, vol. 60, no. 11, pp. 2169-2188, 2009.
[22] C. Grier, K. Thomas, V. Paxson, and M. Zhang, "@spam: The Underground on 140 Characters or Less," Proc. 17th ACM Conf. Computer and Comm. Security, pp. 27-37, 2010.
[23] K. Thomas, C. Grier, D. Song, and V. Paxson, "Suspended Accounts in Retrospect: An Analysis of Twitter Spam," Proc. ACM SIGCOMM Conf. Internet Measurement Conf., pp. 243-258, 2011.
[24] M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, and S. Moon, "I Tube, You Tube, Everybody Tubes: Analyzing the World's Largest User Generated Content Video System," Proc. Seventh ACM SIGCOMM Conf. Internet Measurement, 2007.
[25] M. Cha, A. Mislove, and K.P. Gummadi, "A Measurement-Driven Analysis of Information Propagation in the Flickr Social Network," Proc. 18th Int'l Conf. World Wide Web, 2009.
[26] M. Xie, Z. Wu, and H. Wang, "Honeyim: Fast Detection and Suppression of Instant Messaging Malware in Enterprise-Like Networks,," Proc. 23rd Ann. Computer Security Applications Conf., 2007.
[27] P. Graham, "A Plan for Spam," http://www.paulgraham.comspam.html, Jan. 2008.
[28] J.A. Zdziarski, Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification. No Starch Press, 2005.
[29] M. Xie, H. Yin, and H. Wang, "An Effective Defense Against Email Spam Laundering," Proc. 13th ACM Conf. Computer and Comm. Security, 2006.
[30] J. Yan, "Bot, Cyborg and Automated Turing Test," Proc. 14th Int'l Workshop Security Protocols, Mar. 2006.
[31] Twitter, "Twitter api Wiki," http:/apiwiki.twitter.com/, Feb. 2010.
[32] M. Gjoka, M. Kurant, C.T. Butts, and A. Markopoulou, "Walking in Facebook: A Case Study of Unbiased Sampling of Osns," Proc. 27th IEEE Int'l Conf. Computer Comm., Mar. 2010.
[33] M.R. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork, "On Near-Uniform Url Sampling," Proc. Ninth Int'l World Wide Web Conf. Computer Networks, May 2000.
[34] A.M. Turing, "Computing Machinery and Intelligence," Mind, vol. 59, pp. 433-460, 1950.
[35] A. Porta, G. Baselli, D. Liberati, N. Montano, C. Cogliati, T. Gnecchi-Ruscone, A. Malliani, and S. Cerutti, "Measuring Regularity by Means of a Corrected Conditional Entropy in Sympathetic Outflow," Biological Cybernetics, vol. 78, no. 1, pp. 71-78, 1998.
[36] H.J. Fowler and W.E. Leland, "Local Area Network Traffic Characteristics, with Implications for Broadband Network Congestion Management," IEEE J. Selected Areas in Comm., vol. 9, no. 7, pp. 1139-1149, Sept. 1991.
[37] M. Dischinger, A. Haeberlen, K.P. Gummadi, and S. Saroiu, "Characterizing Residential Broadband Networks," Proc. Seventh ACM SIGCOMM Conf. Internet Measurement, 2007.
[38] Tweetadder, "Automatic Twitter Software," http:/www. tweetadder.com/, Feb. 2010.
[39] T.M. Cover and J.A. Thomas, Elements of Information Theory. Wiley-Interscience, 2006.
[40] S. Gianvecchio and H. Wang, "Detecting Covert Timing Channels: An Entropy-Based Approach," Proc. ACM Conf. Computer and Comm. Security, Oct./Nov. 2007.
[41] H. Husna, S. Phithakkitnukoon, and R. Dantu, "Traffic Shaping of Spam Botnets," Proc. Fifth IEEE Conf. Consumer Comm. and Networking, Jan. 2008.
[42] B.A. Huberman and T. Hogg, "Complexity and Adaptation," Physics D, vol. 2, nos. 1-3, pp. 376-384, 1986.
[43] A. Porta, G. Baselli, D. Liberati, N. Montano, C. Cogliati, T. Gnecchi-Ruscone, A. Malliani, and S. Cerutti, "Measuring Regularity by Means of a Corrected Conditional Entropy in Sympathetic Outflow," Biological Cybernetics, vol. 78, no. 1, pp. 71-78, Jan. 1998.
[44] F. Sebastiani, "Machine Learning in Automated Text Categorization," ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, 2002.
[45] B. Yerazunis, "CRM114 - the Controllable Regex Mutilator," http:/crm114.sourceforge.net, Sept. 2009.
[46] Google, "Google Safe Browsing API," http://code.google.com/apissafebrowsing/, Feb. 2010.
[47] "Phishtank, Join the Fight Against Phishing," http:/www. phishtank.com/, Aug. 2011.
[48] "Uribl, Realtime Uri Blacklist," http://http://www.uribl.comabout.shtml, Aug. 2011.
[49] "Surbl," http://www.surbl.orglists, Aug. 2011.
[50] "The Spamhaus Project," http:/www.spamhaus.org/, Aug. 2011.
[51] L. Breiman, "Random Forests," Machine Learning, vol. 45, pp. 5-32, 2001.
[52] T.K. Ho, "The Random Subspace Method for Constructing Decision Forests," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 832-844, Aug. 1998.
[53] R. Kohavi and R. Quinlan, "Decision Tree Discovery," Handbook of Data Mining and Knowledge Discovery, pp. 267-276, Univ. Press, 1999.
[54] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten, "The Weka Data Mining Software: An Update," ACM SIGKDD Explorations Newsletter, vol. 11, pp. 10-18, 2009.
[55] G. McLachlan, K. Do, and C. Ambroise, Analyzing Microarray Gene Expression Data. Wiley, 2004.
25 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool