The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - December (2008 vol.20)
pp: 1601-1615
James Cheng , The Hong Kong University of Science and Technology, Hong Kong
Wilfred Ng , The Hong Kong University of Science and Technology, Hong Kong
ABSTRACT
Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, research on correlation mining from graph databases is still lacking despite the proliferation of graph data in recent years. We propose a new problem of correlation mining from graph databases, called Correlated Graph Search (CGS). CGS adopts Pearson's correlation coefficient to take into account the occurrence distributions of graphs. However, the problem poses significant challenges, since every subgraph of a graph in the database is a candidate but the number of subgraphs is exponential. We derive two necessary conditions that set bounds on the occurrence probability of a candidate in the database. With this result, we devise an efficient algorithm that mines the candidate set from a much smaller projected database and thus a significantly smaller set of candidates is obtained. Three heuristic rules are further developed to refine the candidate set. We also make use of the bounds to directly answer high-support queries without mining the candidates. Experimental results justify the efficiency of our algorithm. Finally, we generalize the CGS problem and show that our algorithm provides a general solution to most of the existing correlation measures.
INDEX TERMS
Data mining, Mining methods and algorithms
CITATION
James Cheng, Wilfred Ng, "Efficient Correlation Search from Graph Databases", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 12, pp. 1601-1615, December 2008, doi:10.1109/TKDE.2008.86
REFERENCES
[1] S. Brin, R. Motwani, and C. Silverstein, “Beyond Market Baskets: Generalizing Association Rules to Correlations,” Proc. ACM SIGMOD '97, pp. 265-276, 1997.
[2] S. Ma and J.L. Hellerstein, “Mining Mutually Dependent Patterns,” Proc. IEEE Int'l Conf. Data Mining (ICDM '01), pp. 409-416, 2001.
[3] E.R. Omiecinski, “Alternative Interest Measures for Mining Associations in Databases,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 1, pp. 57-69, Jan./Feb. 2003.
[4] H. Xiong, P.-N. Tan, and V. Kumar, “Hyperclique Pattern Discovery,” Proc. ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery (DMKD '06), vol. 13, no. 2, pp.219-242, 2006.
[5] H. Xiong, S. Shekhar, P.-N. Tan, and V. Kumar, “TAPER: A Two-Step Approach for All-Strong-Pairs Correlation Query in Large Databases,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 4, pp.493-508, Apr. 2006.
[6] J. Zhang and J. Feigenbaum, “Finding Highly Correlated Pairs Efficiently with Powerful Pruning,” Proc. Conf. Information and Knowledge Management (CIKM '06), pp. 152-161, 2006.
[7] Y. Ke, J. Cheng, and W. Ng, “Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach,” Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD '06), pp. 227-236, 2006.
[8] J.-Y. Pan, H.-J. Yang, C. Faloutsos, and P. Duygulu, “Automatic Multimedia Cross-Modal Correlation Discovery,” Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD '04), pp. 653-658, 2004.
[9] Y. Sakurai, S. Papadimitriou, and C. Faloutsos, “AutoLag: Automatic Discovery of Lag Correlations in Stream Data,” Proc. IEEE Int'l Conf. Data Eng. (ICDE '05), pp. 159-160, 2005.
[10] H. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. Bhat, H. Weissig, I. Shindyalov, and P. Bourne, “The Protein Data Bank,” Nucleic Acids Research, vol. 28, pp. 235-242, 2000.
[11] M. Kanehisa and S. Goto, “KEGG: Kyoto Encyclopedia of Genes and Genomes,” Nucleic Acids Research, vol. 28, pp. 27-30, 2000.
[12] National Library of Medicine, http://chem.sis.nlm.nih.govchemidplus, 2008.
[13] The International Network for Social Network Analysis, http:/www.insna.org/, 2008.
[14] S. Raghavan and H. Garcia-Molina, “Representing Web Graphs,” Proc. IEEE Int'l Conf. Data Eng. (ICDE '03), pp. 405-416, 2003.
[15] DBLP Dataset, http://dblp.uni-trier.dexml/, 2008.
[16] Y. Ke, J. Cheng, and W. Ng, “Correlation Search in Graph Databases,” Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD '07), pp. 390-399, 2007.
[17] P.-N. Tan, V. Kumar, and J. Srivastava, “Selecting the Right Interestingness Measure for Association Patterns,” Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD '02), pp. 32-41, 2002.
[18] L. Holder, D. Cook, and S. Djoko, “Substructure Discovery in the Subdue System,” Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD '94), pp. 169-180, 1994.
[19] J.W. Raymond, E.J. Gardiner, and P. Willett, “RASCAL: Calculation of Graph Similarity Using Maximum Common Edge Subgraphs,” Computer J., vol. 45, no. 6, pp. 631-644, 2002.
[20] X. Yan, F. Zhu, P.S. Yu, and J. Han, “Feature-Based Similarity Search in Graph Structures,” ACM Trans. Database Systems, vol. 31, no. 4, pp. 1418-1453, 2006.
[21] H. He and A.K. Singh, “Closure-Tree: An Index Structure for Graph Queries,” Proc. IEEE Int'l Conf. Data Eng. (ICDE '06), p. 38, 2006.
[22] D. Williams, J. Huan, and W. Wang, “Graph Database Indexing Using Structured Graph Decomposition,” Proc. IEEE Int'l Conf. Data Eng. (ICDE '07), pp. 976-985, 2007.
[23] S.A. Cook, “The Complexity of Theorem-Proving Procedures,” Proc. Ann. Symp. Theory of Computing (STOC '71), pp. 151-158, 1971.
[24] M. Kuramochi and G. Karypis, “Frequent Subgraph Discovery,” Proc. IEEE Int'l Conf. Data Mining (ICDM '01), pp. 313-320, 2001.
[25] A. Inokuchi, T. Washio, and H. Motoda, “An A Priori-Based Algorithm for Mining Frequent Substructures from Graph Data,” Proc. European Conf. Principles and Practice of Knowledge Discovery (PKDD '00), pp. 13-23, 2000.
[26] X. Yan and J. Han, “Gspan: Graph-Based Substructure Pattern Mining,” Proc. IEEE Int'l Conf. Data Mining (ICDM '02), p. 721, 2002.
[27] H. Reynolds, The Analysis of Cross-Classifications. The Free Press, 1977.
[28] G.U. Yule, “On the Methods of Measuring Association between Two Attributes,” J. Royal Statistical Soc., vol. 75, no. 6, pp. 579-652, 1912.
[29] S. Nijssen and J.N. Kok, “A Quickstart in Frequent Structure Mining Can Make a Difference,” Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD '04), pp. 647-652, 2004.
[30] X. Yan, P.S. Yu, and J. Han, “Graph Indexing Based on Discriminative Frequent Structure Analysis,” ACM Trans. Database Systems, vol. 30, no. 4, pp. 960-993, 2005.
[31] J. Cheng, Y. Ke, and W. Ng, “FG-Index: Towards Verification-Free Query Processing on Graph Databases,” Proc. ACM SIGMOD '07, pp. 857-872, 2007.
[32] G. Piatetsky-Shapiro, “Discovery, Analysis, and Presentation of Strong Rules,” Knowledge Discovery in Databases, pp. 229-248, 1991.
479 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool