Issue No.12 - December (2008 vol.20)
James Cheng , The Hong Kong University of Science and Technology, Hong Kong
Wilfred Ng , The Hong Kong University of Science and Technology, Hong Kong
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2008.86
Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, research on correlation mining from graph databases is still lacking despite the proliferation of graph data in recent years. We propose a new problem of correlation mining from graph databases, called Correlated Graph Search (CGS). CGS adopts Pearson's correlation coefficient to take into account the occurrence distributions of graphs. However, the problem poses significant challenges, since every subgraph of a graph in the database is a candidate but the number of subgraphs is exponential. We derive two necessary conditions that set bounds on the occurrence probability of a candidate in the database. With this result, we devise an efficient algorithm that mines the candidate set from a much smaller projected database and thus a significantly smaller set of candidates is obtained. Three heuristic rules are further developed to refine the candidate set. We also make use of the bounds to directly answer high-support queries without mining the candidates. Experimental results justify the efficiency of our algorithm. Finally, we generalize the CGS problem and show that our algorithm provides a general solution to most of the existing correlation measures.
Data mining, Mining methods and algorithms
James Cheng, Wilfred Ng, "Efficient Correlation Search from Graph Databases", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 12, pp. 1601-1615, December 2008, doi:10.1109/TKDE.2008.86