The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.07 - July (2012 vol.24)
pp: 1216-1230
Jie Chen , University of Minnesota at Twin Cities, Minneapolis
Yousef Saad , University of Minnesota at Twin Cities, Minneapolis
ABSTRACT
This paper presents a method for identifying a set of dense subgraphs of a given sparse graph. Within the main applications of this “dense subgraph problem,” the dense subgraphs are interpreted as communities, as in, e.g., social networks. The problem of identifying dense subgraphs helps analyze graph structures and complex networks and it is known to be challenging. It bears some similarities with the problem of reordering/blocking matrices in sparse matrix techniques. We exploit this link and adapt the idea of recognizing matrix column similarities, in order to compute a partial clustering of the vertices in a graph, where each cluster represents a dense subgraph. In contrast to existing subgraph extraction techniques which are based on a complete clustering of the graph nodes, the proposed algorithm takes into account the fact that not every participating node in the network needs to belong to a community. Another advantage is that the method does not require to specify the number of clusters; this number is usually not known in advance and is difficult to estimate. The computational process is very efficient, and the effectiveness of the proposed method is demonstrated in a few real-life examples.
INDEX TERMS
Dense subgraph, social network, community, matrix reordering, hierarchical clustering, partial clustering.
CITATION
Jie Chen, Yousef Saad, "Dense Subgraph Extraction with Application to Community Detection", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 7, pp. 1216-1230, July 2012, doi:10.1109/TKDE.2010.271
REFERENCES
[1] D. Gibson, J. Kleinberg, and P. Raghavan, "Inferring Web Communities from Link Topology," Proc. Ninth ACM Conf. Hypertext and Hypermedia: Links, Objects, Time and Space (HYPERTEXT), 1998.
[2] M.E. Newman, "Detecting Community Structure in Networks," European Physical J. B, vol. 38, pp. 321-330, 2004.
[3] G.W. Flake, S. Lawrence, and C.L. Giles, "Efficient Identification of Web Communities," Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), 2000.
[4] J. Leskovec, K.J. Lang, A. Dasgupta, and M.W. Mahoney, "Statistical Properties of Community Structure in Large Social and Information Networks," Proc. Int'l Conf. World Wide Web (WWW), 2008.
[5] M.E.J. Newman, "Finding Community Structure in Networks Using the Eigenvectors of Matrices," Physical Rev. E, vol. 74, no. 3, p. 036104, 2006.
[6] S. White and P. Smyth, "A Spectral Clustering Approach to Finding Communities in Graphs," Proc. SIAM Int"l Conf. Data Mining (SDM), 2005.
[7] J. Abello, M.G.C. Resende, and S. Sudarsky, "Massive Quasi-Clique Detection," Proc. Latin Am. Symp. Theoretical Informatics (LATIN), 2002.
[8] R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, "Trawling the Web for Emerging Cyber-Communities," Computer Networks: The Int'l J. Computer and Telecomm. Networking, vol. 31, nos. 11-16, pp. 1481-1493, 1999.
[9] Y. Dourisboure, F. Geraci, and M. Pellegrini, "Extraction and Classification of Dense Communities in the Web," Proc. Int'l Conf. World Wide Web (WWW), 2007.
[10] A. Clauset, M.E.J. Newman, and C. Moore, "Finding Community Structure in Very Large Networks," Physical Rev. E, vol. 70, no. 6, p. 066111, 2004.
[11] M.E. Newman and M. Girvan, "Finding and Evaluating Community Structure in Networks," Physical Rev. E, vol. 69, no. 2, p. 026113, 2004.
[12] K. Wakita and T. Tsurumi, "Finding Community Structure in Mega-Scale Social Networks," Proc. Int'l Conf. World Wide Web (WWW), 2007.
[13] J.P. Scott, Social Network Analysis: A Handbook, second ed. Sage Publications, Ltd, 2000.
[14] E.M. Airoldi, D.M. Blei, S.E. Fienberg, and E.P. Xing, "Mixed Membership Stochastic Blockmodels," J. Machine Learning Research, vol. 9, no. June, pp. 1981-2014, 2008.
[15] K. Nowicki and T.A.B. Snijders, "Estimation and Prediction for Stochastic Blockstructures," J. Am. Statistical Assoc., vol. 96, no. 455, pp. 1077-1087, 2001.
[16] K. Yu, S. Yu, and V. Tresp, "Soft Clustering on Graphs," Proc. Neural Information Processing Systems (NIPS), 2005.
[17] G. Palla, I. Derényi, I. Farkas, and T. Vicsek, "Uncovering the Overlapping Community Structure of Complex Networks in Nature and Society," Nature, vol. 435, no. 7043, pp. 814-818, 2005.
[18] I. Derényi, G. Palla, and T. Vicsek, "Clique Percolation in Random Networks," Physical Rev. Letters, vol. 94, no. 16, p. 160202, 2005.
[19] L. Tang and H. Liu, "Graph Mining Applications to Social Network Analysis," Managing and Mining Graph Data (Advances in Database Systems), C.C. Aggarwal and H. Wang, eds., Springer, 2010.
[20] V.E. Lee, N. Ruan, R. Jin, and C. Aggarwal, "A Survey of Algorithms for Dense Subgraph Discovery," Managing and Mining Graph Data (Advances in Database Systems), C.C. Aggarwal and H. Wang, eds., Springer, 2010.
[21] L. Danon, A. Díaz-Guilera, J. Duch, and A. Arenas, "Comparing Community Structure Identification," J. Statistical Mechanics: Theory and Experiment, vol. 2005, p. P09008, 2005.
[22] J. Shi and J. Malik, "Normalized Cuts and Image Segmentation," IEEE Trans. Pattern Analysis Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.
[23] G. Karypis and V. Kumar, "Multilevel k-Way Partitioning Scheme for Irregular Graphs," J. Parallel Distribution Computing, vol. 48, no. 1, pp. 96-129, 1998.
[24] A. Abou-Rjeili and G. Karypis, "Multilevel Algorithms for Partitioning Power-Law Graphs," Proc. Int'l Parallel and Distributed Processing Symp. (IPDPS), 2006.
[25] S.V. Dongen, "Graph Clustering via a Discrete Uncoupling Process," SIAM J. Matrix Analysis Applications, vol. 30, no. 1, pp. 121-141, 2008.
[26] D.S. Hochbaum, "Polynomial Time Algorithms for Ratio Regions and a Variant of Normalized Cut," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 5, pp. 889-898, May 2010.
[27] J. O'Neil and D.B. Szyld, "A Block Ordering Method for Sparse Matrices," SIAM J. Scientific Computing, vol. 11, no. 5, pp. 811-823, 1990.
[28] Y. Saad, "Finding Exact and Approximate Block Structures for ILU Preconditioning," SIAM J. Scientific Computing, vol. 24, no. 4, pp. 1107-1123, 2002.
[29] Y. Saad, Iterative Methods for Sparse Linear Systems, second ed. SIAM, 2003.
[30] L. Kaufman and P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, Ltd., 1990.
[31] J. Hopcroft, O. Khan, B. Kulis, and B. Selman, "Natural Communities in Large Linked Networks," Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2003.
[32] D. Gibson, R. Kumar, and A. Tomkins, "Discovering Large Dense Subgraphs in Massive Graphs," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2005.
[33] D. Eppstein, Z. Galil, and G.F. Italiano, "Dynamic Graph Algorithms," CRC Handbook of Algorithms and Theory of Computation, ch. 22, CRC Press, 1997.
[34] D. Harel and R.E. Tarjan, "Fast Algorithms for Finding Nearest Common Ancestors," SIAM J. Computing, vol. 13, no. 2, pp. 338-355, 1984.
[35] M.A. Bender and M. Farach-Colton, "The LCA Problem Revisited," Proc. Latin Am. Symp. Theoretical Informatics, pp. 88-94, 2000.
[36] L.A. Adamic and N. Glance, "The Political Blogosphere and the 2004 U.S. Election: Divided They Blog," Proc. Third Int'l Workshop Link Discovery (LinkKDD), 2005.
[37] D. Bu, Y. Zhao, L. Cai, H. Xue, X. Zhu, H. Lu, J. Zhang, S. Sun, L. Ling, N. Zhang, G. Li, and R. Chen, "Topological Structure Analysis of the Protein-Protein Interaction Network in Budding Yeast," Nucleic Acids Research, vol. 31, no. 9, pp. 2443-2450, 2003.
[38] S.R. Corman, T. Kuhn, R.D. McPhee, and K.J. Dooley, "Studying Complex Discursive Systems: Centering Resonance Analysis of Communication," Human Comm. Research, vol. 28, no. 2, pp. 157-206, 2002.
[39] P. Massa and P. Avesani, "Trust-Aware Recommender Systems," Proc. ACM Conf. Recommender Systems, 2007.
[40] R. Albert, H. Jeong, and A.L. Barabási, "The Diameter of the World Wide Web," Nature, vol. 401, pp. 130-131, 1999.
[41] A. Banerjee, I. Dhillon, J. Ghosh, and S. Sra, "Clustering on the Unit Hypersphere Using Von Mises-Fisher Distributions," J. Machine Learning Research, vol. 6, pp. 1345-1382, 2005.
[42] J.L. Herlocker, J.A. Konstan, A. Borchers, and J. Riedl, "An Algorithmic Framework for Performing Collaborative Filtering," Proc. Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), 1999.
[43] K. Lang, "Newsweeder: Learning to Filter Netnews," Proc. Ninth European Conf. Machine Learning (ICML), 1995.
[44] M. Faloutsos, P. Faloutsos, and C. Faloutsos, "On Power-law Relationships of the Internet Topology," Proc. SIGCOMM, 1999.
[45] A.-L. Barabási and R. Albert, "Emergence of Scaling in Random Networks," Science, vol. 286, no. 5439, pp. 509-512, 1999.
[46] I.S. Dhillon, "Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning," Proc. Seventh ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2001.
[47] I.S. Dhillon, S. Mallela, and D.S. Modha, "Information-Theoretic Co-Clustering," Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2003.
[48] M. Rege, M. Dong, and F. Fotouhi, "Co-Clustering Documents and Words Using Bipartite Isoperimetric Graph Partitioning," Proc. Int'l Conf. Data Mining (ICDM), 2006.
[49] A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D.S. Modha, "A Generalized Maximum Entropy Approach to Bregman Co-Clustering and Matrix Approximation," J. Machine Learning Research, vol. 8, pp. 1919-1986, 2007.
[50] Y. Zhao and G. Karypis, "Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering," Machine Learning, vol. 55, no. 3, pp. 311-331, 2004.
15 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool