The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - June (2012 vol.24)
pp: 1080-1091
Lei Tang , Yahoo! Labs, Santa Clara
Xufei Wang , Arizona State University, Tempe
Huan Liu , Arizona State University, Tempe
ABSTRACT
This study of collective behavior is to understand how individuals behave in a social networking environment. Oceans of data generated by social media like Facebook, Twitter, Flickr, and YouTube present opportunities and challenges to study collective behavior on a large scale. In this work, we aim to learn to predict collective behavior in social media. In particular, given information about some individuals, how can we infer the behavior of unobserved individuals in the same network? A social-dimension-based approach has been shown effective in addressing the heterogeneity of connections presented in social media. However, the networks in social media are normally of colossal size, involving hundreds of thousands of actors. The scale of these networks entails scalable learning of models for collective behavior prediction. To address the scalability issue, we propose an edge-centric clustering scheme to extract sparse social dimensions. With sparse social dimensions, the proposed approach can efficiently handle networks of millions of actors while demonstrating a comparable prediction performance to other nonscalable methods.
INDEX TERMS
Classification with network data, collective behavior, community detection, social dimensions.
CITATION
Lei Tang, Xufei Wang, Huan Liu, "Scalable Learning of Collective Behavior", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 6, pp. 1080-1091, June 2012, doi:10.1109/TKDE.2011.38
REFERENCES
[1] L. Tang and H. Liu, "Toward Predicting Collective Behavior via Social Dimension Extraction," IEEE Intelligent Systems, vol. 25, no. 4, pp. 19-25, July/Aug. 2010.
[2] L. Tang and H. Liu, "Relational Learning via Latent Social Dimensions," KDD '09: Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 817-826, 2009.
[3] M. Newman, "Finding Community Structure in Networks Using the Eigenvectors of Matrices," Physical Rev. E (Statistical, Nonlinear, and Soft Matter Physics), vol. 74, no. 3, p. 036104, http://dx.doi.org/10.1103PhysRevE.74.036104 , 2006.
[4] L. Tang and H. Liu, "Scalable Learning of Collective Behavior Based on Sparse Social Dimensions," CIKM '09: Proc. 18th ACM Conf. Information and Knowledge Management, pp. 1107-1116, 2009.
[5] P. Singla and M. Richardson, "Yes, There Is a Correlation: - From Social Networks to Personal Behavior on the Web," WWW '08: Proc. 17th Int'l Conf. World Wide Web, pp. 655-664, 2008.
[6] M. McPherson, L. Smith-Lovin, and J.M. Cook, "Birds of a Feather: Homophily in Social Networks," Ann. Rev. of Sociology, vol. 27, pp. 415-444, 2001.
[7] A.T. Fiore and J.S. Donath, "Homophily in Online Dating: When Do You Like Someone Like Yourself?," CHI '05: Proc. CHI '05 Extended Abstracts on Human Factors in Computing Systems, pp. 1371-1374, 2005.
[8] H.W. Lauw, J.C. Shafer, R. Agrawal, and A. Ntoulas, "Homophily in the Digital World: A LiveJournal Case Study," IEEE Internet Computing, vol. 14, no. 2, pp. 15-23, Mar./Apr. 2010.
[9] S.A. Macskassy and F. Provost, "Classification in Networked Data: A Toolkit and a Univariate Case Study," J. Machine Learning Research, vol. 8, pp. 935-983, 2007.
[10] X. Zhu, "Semi-Supervised Learning Literature Survey," technical report, http://pages.cs.wisc.edu/~jerryzhu/pubssl_survey_ 12_9_2006.pdf , 2006.
[11] Introduction to Statistical Relational Learning, L. Getoor and B. Taskar, eds. The MIT Press, 2007.
[12] X. Zhu, Z. Ghahramani, and J. Lafferty, "Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions," Proc. Int'l Conf. Machine Learning (ICML), 2003.
[13] S. White and P. Smyth, "A Spectral Clustering Approach to Finding Communities in Graphs," Proc. SIAM Data Mining Conf. (SDM), 2005.
[14] M. Newman, "Power Laws, Pareto Distributions and Zipf's Law," Contemporary Physics, vol. 46, no. 5, pp. 323-352, 2005.
[15] F. Harary and R. Norman, "Some Properties of Line Digraphs," Rendiconti del Circolo Matematico di Palermo, vol. 9, no. 2, pp. 161-168, 1960.
[16] T. Evans and R. Lambiotte, "Line Graphs, Link Partitions, and Overlapping Communities," Physical Rev. E, vol. 80, no. 1, p. 16105, 2009.
[17] Y.-Y. Ahn, J.P. Bagrow, and S. Lehmann, "Link Communities Reveal Multi-Scale Complexity in Networks," http://www. citebase.orgabstract?id=oai:arXiv.org:0903.3178 , 2009.
[18] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, "LIBLINEAR: A Library for Large Linear Classification," J. Machine Learning Research, vol. 9, pp. 1871-1874, 2008.
[19] J. Hopcroft and R. Tarjan, "Algorithm 447: Efficient Algorithms for Graph Manipulation," Comm. ACM, vol. 16, no. 6, pp. 372-378, 1973.
[20] J. Neville and D. Jensen, "Leveraging Relational Autocorrelation with Latent Group Models," MRDM '05: Proc. Fourth Int'l Workshop Multi-Relational Mining, pp. 49-55, 2005.
[21] R.-E. Fan and C.-J. Lin, "A Study on Threshold Selection for Multi-Label Classification," technical report, 2007.
[22] L. Tang, S. Rajan, and V.K. Narayanan, "Large Scale Multi-Label Classification via Metalabeler," WWW '09: Proc. 18th Int'l Conf. World Wide Web, pp. 211-220, 2009.
[23] Y. Liu, R. Jin, and L. Yang, "Semi-Supervised Multi-Label Learning by Constrained Non-Negative Matrix Factorization," Proc. Nat'l Conf. Artificial Intelligence (AAAI), 2006.
[24] F. Sebastiani, "Machine Learning in Automated Text Categorization," ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, 2002.
[25] S.A. Macskassy and F. Provost, "A Simple Relational Classifier," Proc. Multi-Relational Data Mining Workshop (MRDM) at the Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2003.
[26] Z. Xu, V. Tresp, S. Yu, and K. Yu, "Nonparametric Relational Learning for Social Network Analysis," KDD '08: Proc. Workshop Social Network Mining and Analysis, 2008.
[27] U. von Luxburg, "A Tutorial on Spectral Clustering," Statistics and Computing, vol. 17, no. 4, pp. 395-416, 2007.
[28] K. Yu, S. Yu, and V. Tresp, "Soft Clustering on Graphs," Proc. Advances in Neural Information Processing Systems (NIPS), 2005.
[29] E. Airodi, D. Blei, S. Fienberg, and E.P. Xing, "Mixed Membership Stochastic Blockmodels," J. Machine Learning Research, vol. 9, pp. 1981-2014, 2008.
[30] S. Fortunato, "Community Detection in Graphs," Physics Reports, vol. 486, nos. 3-5, pp. 75-174, 2010.
[31] G. Palla, I. Derényi, I. Farkas, and T. Vicsek, "Uncovering the Overlapping Community Structure of Complex Networks in Nature and Society," Nature, vol. 435, pp. 814-818, 2005.
[32] H. Shen, X. Cheng, K. Cai, and M. Hu, "Detect Overlapping and Hierarchical Community Structure in Networks," Physica A: Statistical Mechanics and Its Applications, vol. 388, no. 8, pp. 1706-1712, 2009.
[33] S. Gregory, "An Algorithm to Find Overlapping Community Structure in Networks," Proc. European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD), pp. 91-102, http://www.cs.bris.ac.uk/Publicationspub_master.jsp?id=2000712 , 2007.
[34] M. Newman and M. Girvan, "Finding and Evaluating Community Structure in Networks," Physical Rev. E, vol. 69, p. 026113, http://www.citebase.org/abstract?id=oai:arXiv.org:cond-mat 0308217, 2004.
[35] J. Bentley, "Multidimensional Binary Search Trees Used for Associative Searching," Comm. ACM, vol. 18, pp. 509-175, 1975.
[36] T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman, and A.Y. Wu, "An Efficient k-Means Clustering Algorithm: Analysis and Implementation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 881-892, July 2002.
[37] M. Sato and S. Ishii, "On-Line EM Algorithm for the Normalized Gaussian Network," Neural Computation, vol. 12, pp. 407-432, 2000.
[38] P. Bradley, U. Fayyad, and C. Reina, "Scaling Clustering Algorithms to Large Databases," Proc. ACM Knowledge Discovery and Data Mining (KDD) Conf., 1998.
[39] R. Jin, A. Goswami, and G. Agrawal, "Fast and Exact Out-of-Core and Distributed K-Means Clustering," Knowledge and Information Systems, vol. 10, no. 1, pp. 17-40, 2006.
[40] L. Tang, H. Liu, J. Zhang, and Z. Nazeri, "Community Evolution in Dynamic Multi-Mode Networks," KDD '08: Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 677-685, 2008.
[41] Encyclopaedia of Mathematics, M. Hazewinkel, ed. Springer, 2001.
180 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool