The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.09 - September (2010 vol.22)
pp: 1203-1218
Zhaonian Zou , Harbin Institute of Technology, Harbin, China
Jianzhong Li , Harbin Institute of Technology, Harbin, China
Hong Gao , Harbin Institute of Technology, Harbin, China
Shuo Zhang , Harbin Institute of Technology, Harbin, China
ABSTRACT
In many real applications, graph data is subject to uncertainties due to incompleteness and imprecision of data. Mining such uncertain graph data is semantically different from and computationally more challenging than mining conventional exact graph data. This paper investigates the problem of mining uncertain graph data and especially focuses on mining frequent subgraph patterns on an uncertain graph database. A novel model of uncertain graphs is presented, and the frequent subgraph pattern mining problem is formalized by introducing a new measure, called expected support. This problem is proved to be NP-hard. An approximate mining algorithm is proposed to find a set of approximately frequent subgraph patterns by allowing an error tolerance on expected supports of discovered subgraph patterns. The algorithm uses efficient methods to determine whether a subgraph pattern can be output or not and a new pruning method to reduce the complexity of examining subgraph patterns. Analytical and experimental results show that the algorithm is very efficient, accurate, and scalable for large uncertain graph databases. To the best of our knowledge, this paper is the first one to investigate the problem of mining frequent subgraph patterns from uncertain graph data.
INDEX TERMS
Graph mining, uncertain graph, frequent subgraph pattern, algorithm.
CITATION
Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang, "Mining Frequent Subgraph Patterns from Uncertain Graph Data", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 9, pp. 1203-1218, September 2010, doi:10.1109/TKDE.2010.80
REFERENCES
[1] D.J. Cook and L.B. Holder, Mining Graph Data. Wiley, 2006.
[2] M.E. Turanalp and T. Can, "Discovering Functional Interaction Patterns in Protein-Protein Interaction Networks," BMC Bioinformatics, vol. 9, no. 1, p. 276, 2008.
[3] S. Suthram, T. Shlomi, E. Ruppin, R. Sharan, and T. Ideker, "A Direct Comparison of Protein Interaction Confidence Assignment Schemes," BMC Bioinformatics, vol. 7, no. 1, p. 360, 2006.
[4] S. Asthana, O.D. King, F.D. Gibbons, and F.P. Roth, "Predicting Protein Complex Membership Using Probabilistic Network Reliability," Genome Research, vol. 14, no. 6, pp. 1170-1175, 2004.
[5] The STRING Database, http:/string-db.org, 2010.
[6] R. Jiang, Z. Tu, T. Chen, and F. Sun, "Network Motif Identification in Stochastic Networks," Proc. Nat'l Academy of Sciences, vol. 103, no. 25, 2006.
[7] J. Ghosh, H.Q. Ngo, S. Yoon, C. Qiao, "On a Routing Problem Within Probabilistic Graphs and Its Application to Intermittently Connected Networks," Proc. Int'l Conf. Computer Comm., 2007.
[8] M. Koyutürk, A. Grama, and W. Szpankowski, "An Efficient Algorithm for Detecting Frequent Subgraphs in Biological Networks," Bioinformatics, vol. 20, no. Suppl. 1, pp. i200-i207, 2004.
[9] N.N. Dalvi and D. Suciu, "Efficient Query Evaluation on Probabilistic Databases," Proc. Very Large Databases Conf., 2004.
[10] M. Kuramochi and G. Karypis, "Frequent Subgraph Discovery," Proc. Int'l Conf. Data Mining, 2001.
[11] L.G. Valiant, "The Complexity of Computing the Permanent," Theoretical Computer Science, vol. 8, pp. 189-201, 1979.
[12] A. Inokuchi, T. Washio, and H. Motoda, "An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data," Proc. European Conf. Principles of Data Mining and Knowledge Discovery, 2000.
[13] X. Yan and J. Han, "gSpan: Graph-Based Substructure Pattern Mining," Proc. Int'l Conf. Data Mining, 2002.
[14] J. Huan, W. Wang, and J. Prins, "Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism," Proc. Int'l Conf. Data Mining, 2003.
[15] S. Nijssen and J.N. Kok, "A Quickstart in Frequent Structure Mining Can Make a Difference," Proc. ACM SIGKDD Conf., 2004.
[16] N. Vanetik, "Discovering Frequent Graph Patterns Using Disjoint Paths," IEEE Trans. Knowledge and Data Eng., vol. 18, no. 11, pp. 1441-1456, Nov. 2006.
[17] C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi, "Scalable Mining of Large Disk-Based Graph Databases," Proc. ACM SIGKDD Conf., 2004.
[18] J. Wang, W. Hsu, M.L. Lee, and C. Sheng, "A Partition-Based Approach to Graph Mining," Proc. Int'l Conf. Data Eng., 2006.
[19] C. Chen, C.X. Lin, M. Fredrikson, M. Christodorescu, X. Yan, and J. Han, "Mining Graph Patterns Efficiently Via Randomized Summaries," Proc. Very Large Databases Conf., 2009.
[20] X. Yan and J. Han, "Closegraph: Mining Closed Frequent Graph Patterns," Proc. ACM SIGKDD Conf., 2003.
[21] J. Huan, W. Wang, J. Prins, and J. Yang, "Spin: Mining Maximal Frequent Subgraphs from Graph Databases," Proc. ACM SIGKDD Conf., 2004.
[22] Y. Liu, J. Li, and H. Gao, "Summarizing Graph Patterns," Proc. Int'l Conf. Data Eng., 2008.
[23] M. Hasan and M. Zaki, "Output Space Sampling for Graph Patterns," Proc. Very Large Databases Conf., 2009.
[24] J. Wang, Z. Zeng, and L. Zhou, "Clan: An Algorithm for Mining Closed Cliques from Large Dense Graph Databases," Proc. Int'l Conf. Data Eng., 2006.
[25] Z. Zeng, J. Wang, L. Zhou, and G. Karypis, "Out-of-Core Coherent Closed Quasi-Clique Mining from Large Dense Graph Databases," ACM Trans. Database Systems, vol. 32, no. 2, p. 13, 2007.
[26] J. Pei, D. Jiang, and A. Zhang, "On Mining Cross-Graph Quasi-Cliques," Proc. ACM SIGKDD Conf., 2005.
[27] R. Jin, C. Wang, D. Polshakov, S. Parthasarathy, and G. Agrawal, "Discovering Frequent Topological Structures from Graph Datasets," Proc. ACM SIGKDD Conf., 2005.
[28] T. Horváth, J. Ramon, and S. Wrobel, "Frequent Subgraph Mining in Outerplanar Graphs," Proc. ACM SIGKDD Conf., 2006.
[29] X. Yan, H. Cheng, J. Han, and P.S. Yu, "Mining Significant Graph Patterns by Leap Search," Proc. ACM SIGMOD Conf., 2008.
[30] S. Ranu and A.K. Singh, "GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases," Proc. Int'l Conf. Data Eng., 2009.
[31] J. Li, B. Saha, and A. Deshpande, "A Unified Approach to Ranking in Probabilistic Databases," Proc. Very Large Databases Conf., 2009.
[32] J. Pei, B. Jiang, X. Lin, and Y. Yuan, "Probabilistic Skylines on Uncertain Data," Proc. Very Large Databases Conf., 2007.
[33] Y. Tao, R. Cheng, X. Xiao, W.K. Ngai, B. Kao, and S. Prabhakar, "Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions," Proc. Very Large Databases Conf., 2005.
[34] R. Cheng, J. Chen, and X. Xie, "Cleaning Uncertain Data with Quality Guarantees," Proc. Very Large Databases Conf., 2008.
[35] G. Cormode and M.N. Garofalakis, "Sketching Probabilistic Data Streams," Proc. ACM SIGMOD Conf., 2007.
[36] Q. Zhang, F. Li, and K. Yi, "Finding Frequent Items in Probabilistic Data," Proc. ACM SIGMOD Conf., 2008.
[37] T. Bernecker, H.-P. Kriegel, M. Renz, F. Verhein, and A. Züfle, "Probabilistic Frequent Itemset Mining in Uncertain Databases," Proc. ACM SIGKDD Conf., 2009.
[38] C.C. Aggarwal, Y. Li, J. Wang, and J. Wang, "Frequent Pattern Mining with Uncertain Data," Proc. ACM SIGKDD Conf., 2009.
[39] C.C. Aggarwal and P.S. Yu, "A Framework for Clustering Uncertain Data Streams," Proc. Int'l Conf. Data Eng., 2008.
[40] G. Cormode and A. McGregor, "Approximation Algorithms for Clustering Uncertain Data," Proc. Symp. Principles of Database Systems, 2008.
[41] M. Mitzenmacher and E. Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge Univ. Press, 2005.
[42] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.
[43] D.R. Wood, "On the Maximum Number of Cliques in a Graph," Graphs and Combinatorics, vol. 23, no. 3, pp. 337-352, 2007.
[44] R.M. Karp and M. Luby, "Monte-Carlo Algorithms for Enumeration and Reliability Problems," Proc. Ann. Symp. Foundations of Computer Science, 1983.
[45] M. Luby and B. Velickovic, "On Deterministic Approximation of DNF," Proc. Symp. Theory of Computing, 1991.
[46] COG functions, http://www.ncbi.nlm.nih.govCOG/, 2010.
16 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool