Subscribe

Issue No.08 - August (2010 vol.22)

pp: 1093-1109

Lei Chen , Hong Kong University of Science and Technology , Hong Kong

Changliang Wang , EMC, China

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.67

ABSTRACT

Search over graph databases has attracted much attention recently due to its usefulness in many fields, such as the analysis of chemical compounds, intrusion detection in network traffic data, and pattern matching over users' visiting logs. However, most of the existing works focus on search over static graph databases, while in many real applications, graphs are changing over time. In this paper, we investigate a new problem on continuous subgraph pattern search under the situation where multiple target graphs are constantly changing in a stream style, namely, the subgraph pattern search over graph streams. Obviously, the proposed problem is a continuous join between query patterns and graph streams where the join predicate is the existence of subgraph isomorphism. Due to the NP-completeness of subgraph isomorphism checking, to achieve the real-time monitoring of the existence of certain subgraph patterns, we would like to avoid using subgraph isomorphism verification to find the exact query-stream subgraph isomorphic pairs but to offer an approximate answer that could report all probable pairs without missing any actual answer pairs. Therefore, we propose a lightweight yet effective feature structure called Node-Neighbor Tree to filter out false candidate query-stream pairs. To reduce the computational cost, we propose a novel idea, projecting the feature structures into a numerical vector space and conducting dominant relationship checking in the projected space. We design two methods to efficiently verify dominant relationships, and thus, answer the subgraph search over graph streams efficiently. In addition to answering queries over certain graph streams, we propose a novel problem, detecting the appearance of subgraph patterns over uncertain graph streams with high probability (i.e., larger than the probability threshold specified by users). To address this problem, we not only extend the proposed solutions for certain graphs streams, but also propose a new pruning technique by utilizing the probability threshold. We substantiate our methods with extensive experiments on both certain and uncertain graph streams.

INDEX TERMS

Subgraph search, node-neighbor tree, graph streams, uncertain graph streams.

CITATION

Lei Chen, Changliang Wang, "Continuous Subgraph Pattern Search over Certain and Uncertain Graph Streams",

*IEEE Transactions on Knowledge & Data Engineering*, vol.22, no. 8, pp. 1093-1109, August 2010, doi:10.1109/TKDE.2010.67REFERENCES

- [1]
Developmental Therapeutics Program, http:/dtp.nci.nih.gov, 2010.- [2]
Reality Mining Data Set, http:/reality.media.mit.edu, 2010.- [3] E. Adar and C. Re, "Managing Uncertainty in Social Networks,"
IEEE Data Eng. Bull., vol. 30, no. 2, pp. 23-31, July 2007.- [4] W. Aiello, F. Chung, and L. Lu, "Random Evolution in Massive Graphs,"
Handbook of Massive Data Sets, Kluwer, 2002.- [5] S. Asur and S. Parthasarathy, "A Viewpoint-Based Approach for Interaction Graph Analysis,"
Proc. ACM SIGKDD, 2009.- [6] B.H. Bloom, "Space/Time Trade-Offs in Hash Coding with Allowable Errors,"
Comm. ACM, vol. 13, no. 7, pp. 422-426, 1970.- [7] C. Chen, X. Yan, P.S. Yu, J. Han, D.-Q. Zhang, and X. Gu, "Towards Graph Containment Search and Indexing,"
Proc. 33rd Int'l Conf. Very Large Data Bases (VLDB), 2007.- [8] J. Cheng, Y. Ke, W. Ng, and A. Lu, "FG-Index: Towards Verification-Free Query Processing on Graph Databases,"
Proc. ACM SIGMOD, 2007.- [9] S. Fortin, "The Graph Isomorphism Problem," technical report, Dept. of Computing Science, Univ. of Alberta, 1996.
- [10] L. Gao and X.S. Wang, "Continually Evaluating Similarity-Based Pattern Queries on a Streaming Time Series,"
Proc. ACM SIGMOD, 2002.- [11] A. Guttman, "R-Trees: A Dynamic Index Structure for Spatial Searching,"
Proc. ACM SIGMOD, 1984.- [12] H. He and A.K. Singh, "Closure-Tree: An Index Structure for Graph Queries,"
Proc. 22nd Int'l Conf. Data Eng. (ICDE), 2006.- [13] H. He and A.K. Singh, "Graphs-at-a-Time: Query Language and Access Methods for Graph Databases,"
Proc. ACM SIGMOD, 2008.- [14] H. He, H. Wang, J. Yang, and P.S. Yu, "Blinks: Ranked Keyword Searches on Graphs,"
Proc. ACM SIGMOD, 2007.- [15] P. Hintsanen and H. Toivonen, "Finding Reliable Subgraphs from Large Probabilistic Graphs,"
Proc. 12th Int'l Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD), 2008.- [16] H. Jiang, H. Wang, P.S. Yu, and S. Zhou, "GString: A Novel Approach for Efficient Search in Graph Databases,"
Proc. 23rd Int'l Conf. Data Eng. (ICDE), 2007.- [17] J.E. Hopcroft and R.M. Karp, "A $n^{5/2}$ Algorithm for Maximum Matchings in Bipartite,"
Proc. 12th Ann. Symp. Foundations of Computer Science, 1971.- [18] A. Hulgeri and C. Nakhe, "Keyword Searching and Browsing in Databases Using Banks,"
Proc. 18th Int'l Conf. Data Eng. (ICDE), 2002.- [19] R. Jin, S. McCallen, C.-C. Liu, E. Almaas, and X.J. Zhou, "Identify Dynamic Network Modules with Temporal and Spatial Constraints,"
Proc. Pacific Symp. Biocomputing (PSB), 2009.- [20] R. Kali, "The City as a Giant Component: A Random Graph Approach to Zipf'S Law,"
Applied Economics Letters, vol. 10, no. 11, pp. 717-720, 2007.- [21] M. Kuramochi and G. Karypis, "Frequent Subgraph Discovery,"
Proc. 2001 IEEE Int'l Conf. Data Mining (ICDM), 2001.- [22] X. Lian, L. Chen, J.X. Yu, G. Wang, and G. Yu, "Similarity Match over High Speed Time-Series Streams,"
Proc. 23rd Int'l Conf. Data Eng. (ICDE), 2007.- [23] X. Lian and L. Chen, "Monochromatic and Bichromatic Reverse Skyline Search over Uncertain Databases,"
Proc. ACM SIGMOD, 2008.- [24] M. Li and Y. Liu, "Underground Coal Mine Monitoring with Wireless Sensor Networks,"
ACM Trans. Sensor Networks, vol. 5, no. 2, pp. 1-24, 2009.- [25] D. Papadias, Y. Tao, G. Fu, and B. Seeger, "An Optimal and Progressive Algorithm for Skyline Queries,"
Proc. ACM SIGMOD, 2003.- [26] C. Re and D. Suciu, "Management of Data with Uncertainties,"
Proc. 16th Int'l Conf. Information and Knowledge Management (CIKM), 2007.- [27] R. Saito, H. Suzuki, and Y. Hayashizaki, "Interaction Generality, a Measurement to Assess the Reliability of a Protein-Protein Interaction,"
Nucleic Acids Research, vol. 30, no. 5, pp. 1163-1168, 2002.- [28] R. Shamir and D. Tsur, "Faster Subtree Isomorphism,"
J. Algorithms, vol. 33, no. 2, pp. 267-280, 1999.- [29] H. Shang, Y. Zhang, X. Lin, and J.X. Yu, "Taming Verification Hardness: An Efficient Algorithm for Testing Subgraph Isomorphism,"
Proc. 34th Int'l Conf. Very Large Data Bases (VLDB), 2008.- [30] D. Shasha, J.T.L. Wang, and R. Giugno, "Algorithmics and Applications of Tree and Graph Searching,"
Proc. ACM SIGACT-SIGMOD Symp. Principles of Database Systems, 2002.- [31] J. Sun, C. Faloutsos, S. Papadimitriou, and P.S. Yu, "Graphscope: Parameter-Free Mining of Large Time-Evolving Graphs,"
Proc. ACM SIGKDD, 2007.- [32] C. Tantipathananandh and T. Berger-Wolf, "Constant-Factor Approximation Algorithms for Identifying Dynamic Communities,"
Proc. ACM SIGKDD, 2009.- [33] C. Tantipathananandh, T. Berger-Wolf, and D. Kempe, "A Framework for Community Identification in Dynamic Social Networks,"
Proc. ACM SIGKDD, 2007.- [34] Y. Tian, R.C. Mceachin, C. Santos, D.J. States, and J.M. Patel, "SAGA: A Subgraph Matching Tool for Biological Graphs,"
Bioinformatics, vol. 23, no. 2, pp. 232-239, 2007.- [35] Y. Tian and J.M. Patel, "Tale: A Tool for Approximate Large Graph Matching,"
Proc. 24th Int'l Conf. Data Eng. (ICDE), 2008.- [36] S. Tri${\cal B}$ l and U. Leser, "Fast and Practical Indexing and Querying of Very Large Graphs,"
Proc. ACM SIGMOD, 2007.- [37] C. Wang and L. Chen, "Continuous Subgraph Pattern Search over Graph Streams,"
Proc. 25th Int'l Conf. Data Eng. (ICDE), 2009.- [38] H. Wang, H. He, J. Yang, P.S. Yu, and J.X. Yu, "Dual Labeling: Answering Graph Reachability Queries in Constant Time,"
Proc. 22nd Int'l Conf. Data Eng. (ICDE), 2006.- [39] D.W. Williams, J. Huan, and W. Wang, "Graph Database Indexing Using Structured Graph Decomposition,"
Proc. 23rd Int'l Conf. Data Eng. (ICDE), 2007.- [40] X. Yan and J. Han, "gSpan: Graph-Based Substructure Pattern Mining,"
Proc. 2002 IEEE Int'l Conf. Data Mining (ICDM), 2002.- [41] X. Yan, P.S. Yu, and J. Han, "Graph Indexing: A Frequent Structure-Based Approach,"
Proc. ACM SIGMOD, 2004.- [42] R. Zass and A. Shashua, "Probabilistic Graph and Hypergraph Matching,"
Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2008.- [43] S. Zhang, M. Hu, and J. Yang, "Treepi: A Novel Graph Indexing Method,"
Proc. 23rd Int'l Conf. Data Eng. (ICDE), 2007.- [44] S. Zhang, S. Li, and J. Yang, "GADDI: Distance Index Based Subgraph Matching in Biological Networks,"
Proc. Int'l Conf. Extending Database Technology (EDBT), 2009.- [45] P. Zhao, J.X. Yu, and P.S. Yu, "Graph Indexing: Tree + delta $<=$ Graph,"
Proc. 33rd Int'l Conf. Very Large Data Bases (VLDB), 2007.- [46] L. Zou, L. Chen, J.X. Yu, and Y. Lu, "A Novel Spectral Coding in a Large Graph Database,"
Proc. Int'l Conf. Extending Database Technology (EDBT), 2008. |