Subscribe

Issue No.12 - December (2008 vol.20)

pp: 1601-1615

Yiping Ke , The Hong Kong University of Science and Technology, Hong Kong

James Cheng , The Hong Kong University of Science and Technology, Hong Kong

Wilfred Ng , The Hong Kong University of Science and Technology, Hong Kong

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2008.86

ABSTRACT

Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, research on correlation mining from graph databases is still lacking despite the proliferation of graph data in recent years. We propose a new problem of correlation mining from graph databases, called Correlated Graph Search (CGS). CGS adopts Pearson's correlation coefficient to take into account the occurrence distributions of graphs. However, the problem poses significant challenges, since every subgraph of a graph in the database is a candidate but the number of subgraphs is exponential. We derive two necessary conditions that set bounds on the occurrence probability of a candidate in the database. With this result, we devise an efficient algorithm that mines the candidate set from a much smaller projected database and thus a significantly smaller set of candidates is obtained. Three heuristic rules are further developed to refine the candidate set. We also make use of the bounds to directly answer high-support queries without mining the candidates. Experimental results justify the efficiency of our algorithm. Finally, we generalize the CGS problem and show that our algorithm provides a general solution to most of the existing correlation measures.

INDEX TERMS

Data mining, Mining methods and algorithms

CITATION

Yiping Ke, James Cheng, Wilfred Ng, "Efficient Correlation Search from Graph Databases",

*IEEE Transactions on Knowledge & Data Engineering*, vol.20, no. 12, pp. 1601-1615, December 2008, doi:10.1109/TKDE.2008.86REFERENCES

- [1] S. Brin, R. Motwani, and C. Silverstein, “Beyond Market Baskets: Generalizing Association Rules to Correlations,”
Proc. ACM SIGMOD '97, pp. 265-276, 1997.- [6] J. Zhang and J. Feigenbaum, “Finding Highly Correlated Pairs Efficiently with Powerful Pruning,”
Proc. Conf. Information and Knowledge Management (CIKM '06), pp. 152-161, 2006.- [12]
National Library of Medicine, http://chem.sis.nlm.nih.govchemidplus, 2008.- [13]
The International Network for Social Network Analysis, http:/www.insna.org/, 2008.- [15]
DBLP Dataset, http://dblp.uni-trier.dexml/, 2008.- [17] P.-N. Tan, V. Kumar, and J. Srivastava, “Selecting the Right Interestingness Measure for Association Patterns,”
Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD '02), pp. 32-41, 2002.- [18] L. Holder, D. Cook, and S. Djoko, “Substructure Discovery in the Subdue System,”
Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD '94), pp. 169-180, 1994.- [25] A. Inokuchi, T. Washio, and H. Motoda, “An A Priori-Based Algorithm for Mining Frequent Substructures from Graph Data,”
Proc. European Conf. Principles and Practice of Knowledge Discovery (PKDD '00), pp. 13-23, 2000.- [27] H. Reynolds,
The Analysis of Cross-Classifications. The Free Press, 1977.- [31] J. Cheng, Y. Ke, and W. Ng, “FG-Index: Towards Verification-Free Query Processing on Graph Databases,”
Proc. ACM SIGMOD '07, pp. 857-872, 2007.- [32] G. Piatetsky-Shapiro, “Discovery, Analysis, and Presentation of Strong Rules,”
Knowledge Discovery in Databases, pp. 229-248, 1991. |