This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Maximal Biclique Subgraphs and Closed Pattern Pairs of the Adjacency Matrix: A One-to-One Correspondence and Mining Algorithms
December 2007 (vol. 19 no. 12)
pp. 1625-1637
Enumerating maximal biclique subgraphs from a graph is a computationally challenging problem. In this paper, we efficiently enumerate them through the use of closed patterns of the adjacency matrix of the graph. For an undirected graph $G$ without self-loops, we prove that: (i) the number of closed patterns in the adjacency matrix of $G$ is even; and (ii) for every maximal biclique subgraph, there always exists a unique pair of closed patterns that matches the two vertex sets of the subgraph. Therefore, the problem of enumerating maximal bicliques can be solved by using efficient algorithms for mining closed patterns, which are algorithms extensively studied in the data mining field. However, this direct use of existing algorithms causes a duplicated enumeration. To achieve high efficiency, we propose an $O(mn)$ time delay algorithm for a non-duplicated enumeration, in particular for enumerating those maximal bicliques with a large size, where $m$ and $n$ are the number of edges and vertices of the graph respectively. We evaluate the high efficiency of our algorithm by comparing it to state-of-the-art algorithms on many graphs.

[1] A. Inokuchi, T. Washio, and H. Motoda, “Complete Mining of Frequent Patterns from Graphs: Mining Graph Data,” Machine Learning, vol. 50, no. 3, pp. 321-354, 2003.
[2] M. Kuramochi and G. Karypis, “An Efficient Algorithm for Discovering Frequent Subgraphs,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 9, pp. 1038-1051, Sept. 2004.
[3] M. Koyutürk, A. Grama, and W. Szpankowski, “An Efficient Algorithm for Detecting Frequent Subgraphs in Biological Networks,” Bioinformatics, Supp. 1, pp. 200-207, 2004.
[4] M. Kuramochi and G. Karypis, “Grew—A Scalable Frequent Subgraph Discovery Algorithm,” Proc. Fourth IEEE Int'l Conf. Data Mining (ICDM '04), pp. 439-442, 2004.
[5] J. Huan, W. Wang, and J. Prins, “Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism,” Proc. Third IEEE Int'l Conf. Data Mining (ICDM '03), pp. 549-552, 2003.
[6] X. Yan and J. Han, “Gspan: Graph-Based Substructure Pattern Mining,” Proc. IEEE Int'l Conf. Data Mining (ICDM '02), pp. 721-724, 2002.
[7] J. Huan, W. Wang, J. Prins, and J. Yang, “Spin: Mining Maximal Frequent Subgraphs from Graph Databases,” Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '04), pp. 581-586, 2004.
[8] A.O. Mendelzon and P.T. Wood, “Finding Regular Simple Paths in Graph Databases,” SIAM J. Computing, vol. 24, no. 6, pp. 1235-1258, 1995.
[9] X. Yan, P.S. Yu, and J. Han, “Substructure Similarity Search in Graph Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '05), pp. 766-777, 2005.
[10] C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi, “Scalable Mining of Large Disk-Based Graph Databases,” Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '04), pp. 316-325, 2004.
[11] S. Flesca and S. Greco, “Querying Graph Databases,” Proc. Seventh Int'l Conf. Extending Database Technology (EDBT '00), pp. 510-524, 2000.
[12] H. Hu, X. Yan, Y. Huang, J. Han, and X.J. Zhou, “Mining Coherent Dense Subgraphs across Massive Biological Networks for Functional Discovery,” Bioinformatics, supp. 1, pp. 213-221, 2005.
[13] J. Pei, D. Jiang, and A. Zhang, “On Mining Cross-Graph Quasi-Cliques,” Proc. 11th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '05), pp. 228-238, 2005.
[14] X. Yan, X.J. Zhou, and J. Han, “Mining Closed Relational Graphs with Connectivity Constraints,” Proc. 11th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '05), pp. 324-333, 2005.
[15] X. Yan and J. Han, “Closegraph: Mining Closed Frequent Graph Patterns,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), pp. 286-295, 2003.
[16] T. Washio and H. Motoda, “State of the Art of Graph-Based Data Mining,” SIGKDD Explorations, vol. 5, no. 1, pp. 59-68, 2003.
[17] A.Z. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J.L. Wiener, “Graph Structure in the Web,” Computer Networks, vol. 33, nos. 1-6, pp. 309-320, 2000.
[18] R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, “Trawling the Web for Emerging Cyber-Communities,” Computer Networks, vol. 31, no. 11-16, pp. 1481-1493, 1999.
[19] T. Murata, “Discovery of User Communities from Web Audience Measurement Data,” Proc. IEEE/WIC/ACM Int'l Conf. Web Intelligence (WI '04), pp. 673-676, 2004.
[20] J.E. Rome and R.M. Haralick, “Towards a Formal Concept Analysis Approach to Exploring Communities on the World Wide Web,” Proc. Int'l Conf. Formal Concept Analysis, pp. 33-48, 2005.
[21] D. Bu, Y. Zhao, L. Cai, H. Xue, X. Zhu, H. Lu, J. Zhang, S. Sun, L. Ling, N. Zhang, G. Li, and R. Chen, “Topological Structure Analysis of the Protein-Protein Interaction Network in Budding Yeast,” Nucleic Acids Research, vol. 31, no. 9, pp. 2443-2450, 2003.
[22] H. Li, J. Li, and L. Wong, “Discovering Motif Pairs at Interaction Sites from Protein Sequences on a Proteome-Wide Scale,” Bioinformatics, vol. 22, pp. 989-996, 2006.
[23] D.J. Reiss and B. Schwikowski, “Predicting Protein-Peptide Interactions via a Network-Based Motif Sampler,” Bioinformatics, vol. 20, pp. i274-i282, 2004 (supp.).
[24] A.H. Tong, B. Drees, G. Nardelli, G.D. Bader, B. Brannetti, L. Castagnoli, M. Evangelista, S. Ferracuti, B. Nelson, S. Paoluzi, M. Quondam, A. Zucconi, C.W. Hogue, S. Fields, C. Boone, and G. Cesareni, “A Combined Experimental and Computational Strategy to Define Protein Interaction Networks for Peptide Recognition Modules,” Science, vol. 295, pp. 321-324, 2002.
[25] A.C. Driskell, C. Ane, J.G.B.M.M. McMahon, B.C. OMeara, and M.J. Sanderson, “Prospects for Building the Tree of Life from Large Sequence Databases,” Science, vol. 306, pp. 1172-1174, 2004.
[26] M.J. Sanderson, A.C. Driskell, R.H. Ree, O. Eulenstein, and S. Langley, “Obtaining Maximal Concatenated Phylogenetic Data Sets from Large Sequence Databases,” Molecular Biology and Evolution, vol. 20, no. 7, pp. 1036-1042, 2003.
[27] C. Yan, J.G. Burleigh, and O. Eulenstein, “Identifying Optimal Incomplete Phylogenetic Data Sets from Sequence Databases,” Molecular Phylogenetics and Evolution, vol. 35, no. 3, pp. 528-535, 2005.
[28] G. Alexe, S. Alexe, Y. Crama, S. Foldes, P.L. Hammer, and B. Simeone, “Consensus Algorithms for the Generation of All Maximal Bicliques,” Discrete Applied Math., vol. 145, no. 1, pp.11-21, 2004.
[29] D. Eppstein, “Arboricity and Bipartite Subgraph Listing Algorithms,” Information Processing Letters, vol. 51, pp. 207-211, 1994.
[30] K. Makino and T. Uno, “New Algorithms for Enumerating All Maximal Cliques,” Proc. Ninth Scandinavian Workshop Algorithm Theory (SWAT '04), pp. 260-272, 2004.
[31] V.M. Dias, C.M. de Figueiredo, and J.L. Szwarcfiter, “Generating Bicliques of a Graph in Lexicographic Order,” Theoretical Computer Science, vol. 337, pp. 240-248, 2005.
[32] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '93), pp. 207-216, May 1993.
[33] M.J. Zaki and M. Ogihara, “Theoretical Foundations of Association Rules,” Proc. Third ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery, 1998.
[34] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering Frequent Closed Itemsets for Association Rules,” Proc. Seventh Int'l Conf. Database Theory (ICDT '99), pp. 398-416, 1999.
[35] M.J. Zaki and C.-J. Hsiao, “CHARM: An Efficient Algorithm for Closed Itemset Mining,” Proc. Second SIAM Int'l Conf. Data Mining, 2002.
[36] J. Wang, J. Han, and J. Pei, “CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), pp. 236-245, 2003.
[37] G. Grahne and J. Zhu, “Efficiently Using Prefix-Trees in Mining Frequent Itemsets,” Proc. IEEE Int'l Conf. Data Mining Workshop Frequent Itemset Mining Implementations (FIMI '03), 2003.
[38] G. Grahne and J. Zhu, “Fast Algorithms for Frequent Itemset Mining Using Fp-Trees,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 10, pp. 1347-1362, Oct. 2005.
[39] T. Uno, M. Kiyomi, and H. Arimura, “LCM ver.2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets,” Proc. IEEE Int'l Conf. Data Mining Workshop Frequent Itemset Mining Implementations (FIMI '04) , 2004.
[40] G. Stumme, R. Wille, and U. Wille, “Conceptual Knowledge Discovery in Databases Using Formal Concept Analysis Methods,” Proc. Second European Symp. Principles of Data Mining and Knowledge Discovery (PKDD '98), pp. 450-458, 1998.
[41] S.C. Madeira and A.L. Oliveira, “Biclustering Algorithms for Biological Data Analysis: A Survey,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 1, no. 1, pp. 24-45, Jan.-Mar. 2004.
[42] Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal, “Mining Minimal Non-Redundant Association Rules Using Frequent Closed Itemsets,” Computational Logic, pp. 972-986, 2000.
[43] B. Goethals and M.J. Zaki, “FIMI'03: Workshop on Frequent Itemset Mining Implementations,” Proc. IEEE Int'l Conf. Data Mining Workshop Frequent Itemset Mining Implementations (FIMI '03), pp. 1-13, 2003.
[44] R. Rymon, “Search through Systematic Set Enumeration,” Proc. Third Int'l Conf. Principles of Knowledge Representation and Reasoning, pp. 539-550, Oct. 1992.
[45] S. Maslov and K. Sneppen, “Specificity and Stability in Topology of Protein Networks,” Science, vol. 296, pp. 910-913, 2002.
[46] B.J. Breitkreutz, C. Stark, and M. Tyers, “The Grid: The General Repository for Interaction Datasets,” Genome Biology, vol. 4, no. 3, p. R23, 2003.

Index Terms:
Mining methods and algorithms, Graph algorithms
Citation:
Jinyan Li, Guimei Liu, Haiquan Li, Limsoon Wong, "Maximal Biclique Subgraphs and Closed Pattern Pairs of the Adjacency Matrix: A One-to-One Correspondence and Mining Algorithms," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 12, pp. 1625-1637, Dec. 2007, doi:10.1109/TKDE.2007.190660
Usage of this product signifies your acceptance of the Terms of Use.