This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Clustering Large Probabilistic Graphs
Feb. 2013 (vol. 25 no. 2)
pp. 325-336
George Kollios, Boston University, Boston
Michalis Potamias, Smart Deals, Groupon
Evimaria Terzi, Boston University, Boston
We study the problem of clustering probabilistic graphs. Similar to the problem of clustering standard graphs, probabilistic graph clustering has numerous applications, such as finding complexes in probabilistic protein-protein interaction (PPI) networks and discovering groups of users in affiliation networks. We extend the edit-distance-based definition of graph clustering to probabilistic graphs. We establish a connection between our objective function and correlation clustering to propose practical approximation algorithms for our problem. A benefit of our approach is that our objective function is parameter-free. Therefore, the number of clusters is part of the output. We also develop methods for testing the statistical significance of the output clustering and study the case of noisy clusterings. Using a real protein-protein interaction network and ground-truth data, we show that our methods discover the correct number of clusters and identify established protein relationships. Finally, we show the practicality of our techniques using a large social network of Yahoo! users consisting of one billion edges.
Index Terms:
Probabilistic logic,Clustering algorithms,Approximation algorithms,Partitioning algorithms,Proteins,Data mining,Approximation methods,probabilistic databases,Uncertain data,probabilistic graphs,clustering algorithms
Citation:
George Kollios, Michalis Potamias, Evimaria Terzi, "Clustering Large Probabilistic Graphs," IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 2, pp. 325-336, Feb. 2013, doi:10.1109/TKDE.2011.243
Usage of this product signifies your acceptance of the Terms of Use.