This Article 
 Bibliographic References 
 Add to: 
A Variational Bayesian Framework for Clustering with Multiple Graphs
April 2012 (vol. 24 no. 4)
pp. 577-590
Motoki Shiga, Kyoto University, Gokasho
Hiroshi Mamitsuka, Kyoto University, Gokasho
Mining patterns in graphs has become an important issue in real applications, such as bioinformatics and web mining. We address a graph clustering problem where a cluster is a set of densely connected nodes, under a practical setting that 1) the input is multiple graphs which share a set of nodes but have different edges and 2) a true cluster cannot be found in all given graphs. For this problem, we propose a probabilistic generative model and a robust learning scheme based on variational Bayesian estimation. A key feature of our probabilistic framework is that not only nodes but also given graphs can be clustered at the same time, allowing our model to capture clusters found in only part of all given graphs. We empirically evaluated the effectiveness of the proposed framework on not only a variety of synthetic graphs but also real gene networks, demonstrating that our proposed approach can improve the clustering performance of competing methods in both synthetic and real data.

[1] S.E. Schaeffer, "Graph Clustering," Computer Science Rev., vol. 1, pp. 27-64, 2007.
[2] S. Arora, S. Rao, and U. Vazirani, "Geometry, Flows, and Graph-Partitioning Algorithms," Comm. ACM, vol. 51, no. 10, pp. 96-105, 2008.
[3] E.M. Marcotte et al., "A Combined Algorithm for Genome-Wide Prediction of Protein Function," Nature, vol. 402, no. 6757, pp. 25-26, 1999.
[4] R. Sharan et al., "Network-Based Prediction of Protein Function," Molecular Systems Biology, vol. 3, no. 1,article number 88, 2007.
[5] K. Maciag et al., "Systems-Level Analyses Identify Extensive Coupling among Gene Expression Machines," Molecular Systems Biology, vol. 2, article number 2006.0003, 2006.
[6] U. Luxburg, "A Tutorial on Spectral Clustering," Statistics and Computing, vol. 17, no. 4, pp. 395-416, 2007.
[7] H. Attias, "A Variational Bayesian Framework for Graphical Models," Proc. Neural Information Processing Systems (NIPS), pp. 209-215, 2000.
[8] C.M. Bishop, "Approximate Inference," Pattern Recognition and Machine Learning, Chapter 10, pp. 461-522, Springer, 2006.
[9] D.M. Blei, A.Y. Ng, and M.I. Jordan, "Latent Dirichlet Allocation," J. Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[10] M.E.J. Newman and E.A. Leicht, "Mixture Models and Exploratory Analysis in Networks," Proc. Nat'l Academy of Sciences USA, vol. 104, no. 23, pp. 9564-69, 2007.
[11] J.M. Hofman and C.H. Wiggins, "Bayesian Approach to Network Modularity," Physical Rev. Letters, vol. 100, no. 25,258701, 2008.
[12] J. Sinkkonen, J. Aukia, and S. Kaski, "Component Models for Large Networks," arXiv e-prints, arXiv:0803.1628, 2008.
[13] P.W. Holland, K.B. Laskey, and S. Leinhardt, "Stochastic Blockmodels: First Steps," Social Networks, vol. 5, no. 2, pp. 109-137, 1983.
[14] T. Snijders and K. Nowicki, "Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure," J. Classification, vol. 14, no. 1, pp. 75-100, 1997.
[15] E.M. Airoldi, D.M. Blei, S.E. Fienberg, and E.P. Xing, "Mixed Membership Stochastic Blockmodels," J. Machine Learning Research, vol. 9, pp. 1981-2014, 2008.
[16] J. Sinkkonen, J. Parkkinen, J. Aukia, and S. Kaski, "A Simple Infinite Topic Mixture for Rich Graphs,and Relational Data," Proc. NIPS Workshop Analyzing Graphs: Theory and Applications, 2008.
[17] D. Zhou and C.J.C. Burges, "Spectral Clustering and Transductive Learning with Multiple Views," Proc. 24th Int'l Conf. Machine Learning (ICML), pp. 1159-1166, 2007.
[18] T. Ito et al., "A Comprehensive Two-Hybrid Analysis to Explore the Yeast Protein Interactome," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 8, pp. 4569-74, 2001.
[19] A.H. Tong et al., "Global Mapping of the Yeast Genetic Interaction Network," Science, vol. 303, no. 5569, pp. 803-813, 2004.
[20] C. von Mering et al., "Comparative Assessment of Large-Scale Data Sets of Protein-Protein Interactions," Nature, vol. 417, no. 6887, pp. 399-403, 2002.
[21] J.M. Cherry et al., "Genetic and Physical Maps of Saccharomyces Cerevisiae," Nature, vol. 387, no. 6632 Suppl., pp. 67-73, 1997.
[22] C.T. Harbison et al., "Transcriptional Regulatory Code of a Eukaryotic Genome," Nature, vol. 431, no. 7004, pp. 99-104, 2004.
[23] A. Ruepp et al., "The Funcat, a Functional Annotation Scheme for Systematic Classification of Proteins from Whole Genomes," Nucleic Acids Research, vol. 32, no. 18, pp. 5539-5545, 2004.
[24] A. Strehl and J. Ghosh, "Relationship-Based Clustering and Visualization for High-Dimensional Data Mining," INFORMS J. Computing, vol. 15, no. 2, pp. 208-230, 2003.
[25] J. Han, H. Cheng, D. Xin, and X. Yan, "Frequent Pattern Mining: Current Status and Future Directions," Data Mining and Knowledge Discovery, vol. 15, pp. 55-86, 2007.
[26] X. Yan and J. Han, "gSpan: Graph-Based Substructure Pattern Mining," Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 721-724, 2002.

Index Terms:
Clustering, graphs, statistical machine learning, variational Bayesian learning, localized clusters.
Motoki Shiga, Hiroshi Mamitsuka, "A Variational Bayesian Framework for Clustering with Multiple Graphs," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 4, pp. 577-590, April 2012, doi:10.1109/TKDE.2010.272
Usage of this product signifies your acceptance of the Terms of Use.