The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - December (2008 vol.20)
pp: 1627-1640
Hua-Gang Li , NEC Laboratories America, Cupertino
Jun'ichi Tatemura , NEC Laboratories America, Cupertino
Wang-Pin Hsiung , NEC Laboratories America, Cupertino
Divyakant Agrawal , NEC Laboratories America, Cupertino
K. Selçuk Candan , NEC Laboratories America, Cupertino
ABSTRACT
An XML publish/subscribe system needs to filter a large number of queries over XML streams. Most existing systems only consider filtering the simple XPath statements. In this paper, we focus on filtering of the more complex Generalized-Tree-Pattern (GTP) queries. Our filtering mechanism is based on a novel Tree-of-Path (TOP) encoding scheme, which compactly represents the path matches for the entire document. First, we show that the TOP encodings can be efficiently produced via a shared bottom-up path matching. Second, with the aid of this TOP encoding, we can 1) achieve polynomial time and space complexity for post processing, 2) avoid redundant predicate evaluations, 3) allow an efficient duplicate-free and merge join-based algorithm for merging multiple encoded path matches and 4) simplify the processing of GTP queries. Overall our approach maximizes the sharing opportunity across queries by exploiting the suffix as well as prefix sharing. At the same time, our TOP encodings allow efficient post processing for GTP queries. Extensive performance studies show that our GFilter solution not only achieves significantly better filtering performance than state-of-the-art algorithms, but also is capable of efficiently filtering the more complex GTP queries.
INDEX TERMS
XML filtering, XML streams, generalized-tree-pattern queries, result encoding
CITATION
Hua-Gang Li, Jun'ichi Tatemura, Wang-Pin Hsiung, Divyakant Agrawal, K. Selçuk Candan, "Scalable Filtering of Multiple Generalized-Tree-Pattern Queries over XML Streams", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 12, pp. 1627-1640, December 2008, doi:10.1109/TKDE.2008.83
REFERENCES
[1] M. Altinel and M. Franklin, “Efficient Filtering of XML Documents for Selective Dissemination of Information,” Proc. 26th Int'l Conf. Very Large Data Bases (VLDB '00), pp. 53-64, 2000.
[2] C. Chan, P. Felber, M.N. Garofalakis, and R. Rastogi, “Efficient Filtering of XML Documents with XPath Expressions,” Proc. 18th Int'l Conf. Data Eng. (ICDE '02), pp. 235-244, 2002.
[3] Y. Diao, M. Altinel, M. Franklin, H. Zhang, and P.M. Fischer, “Path Sharing and Predicate Evaluation for High-Performance XML Filtering,” ACM Trans. Database Systems, vol. 28, no. 4, pp.467-516, 2003.
[4] Y. Chen, S. Davidson, and Y. Zheng, “An Efficient XPath Query Processor for XML Streams,” Proc. 22nd Int'l Conf. Data Eng. (ICDE '06), pp. 79-88, 2006.
[5] D. Florescu et al., “The BEA/XQRL Streaming XQuery Processor,” Proc. 29th Int'l Conf. Very Large Data Bases (VLDB '03), pp. 997-1008, 2003.
[6] T.J. Green, A. Gupta, G. Miklau, M. Onizuka, and D. Suciu, “Processing XML Streams with Deterministic Automata and Stream Index,” ACM Trans. Database Systems, vol. 29, no. 4, pp.52-788, 2004.
[7] K.S. Candan et al., “AFilter: Adaptable XML Filtering with Prefix-Caching and Suffix-Clustering,” Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB '06), pp. 559-570, 2006.
[8] Y. Diao and M. Franklin, “Query Processing for High-Volume XML Message Brokering,” Proc. 29th Int'l Conf. Very Large Data Bases (VLDB '03), pp. 235-244, 2003.
[9] K.S. Beyer et al., “System RX: One Part Relational, One Part XML,” Proc. ACM SIGMOD '05, pp. 347-358, 2005.
[10] Z. Chen, H.V. Jagadish, L.V.S. Lakshmanan, and S. Paparizos, “From Tree Patterns to Generalized Tree Patterns: On Efficient Evaluation of XQuery,” Proc. 29th Int'l Conf. Very Large Data Bases (VLDB '03), pp. 237-248, 2003.
[11] N. Bruno, L. Gravano, N. Koudas, and D. Srivastava, “Navigation- versus Index-Based XML Multi-Query Processing,” Proc. 19th Int'l Conf. Data Eng. (ICDE '03), pp. 139-150, 2003.
[12] L.V.S. Lakshmanan and S. Parthasarathy, “On Efficient Matching of Stream XML Documents and Queries,” Proc. Eighth Int'l Conf. Extending Database Technology (EDBT '02), pp. 142-160, 2002.
[13] S. Chen et al., “${\rm Twig}^{2}{\rm Stack}$ : Bottom-Up Processing of Generalized-Tree-Pattern Queries over XML Documents,” Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB '06), pp. 283-294, 2006.
[14] S. Chen, H. Li, J. Tatemura, W. Hsiung, D. Agrawal, and K.S. Selçuk, “GFilter: Scalable Filtering of Multiple Generalized-Tree-Pattern Queries over XML Streams,” technical report, NEC Lab, 2007.
[15] N. Bruno, N. Koudas, and D. Srivastava, “Holistic Twig Joins: Optimal XML Pattern Matching,” Proc. ACM SIGMOD '02, pp.310-321, 2002.
[16] W3C, XQuery 1.0: An XML Query Language, 2007.
[17] D. Barbosa, A. Mendelzon, J. Keenleyside, and K. Lyons, “ToXgene: A Template-Based Data Generator for XML,” Proc. ACM SIGMOD '02, p. 616, 2002.
[18] A. Gupta and D. Suciu, “Stream Processing of XPath Queries with Predicates,” Proc. ACM SIGMOD '03, pp. 419-430, 2003.
[19] Z.G. Ives, A.Y. Halevy, and D.S. Weld, “An XML Query Engine for Network-Bound Data,” The VLDB J., vol. 11, no. 4, pp. 380-402, 2002.
[20] J. Kwon, P. Rao, B. Moon, and S. Lee, “FiST: Scalable XML Document Filtering by Sequencing Twig Patterns,” Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), pp. 217-228, 2005.
[21] F. Tian et al., “Implementing a Scalable XML Publish/Subscribe System Using a Relational Database System,” Proc. ACM SIGMOD '04, pp. 479-490, 2004.
[22] H. Su, E.A. Rundensteiner, and M. Mani, “Semantic Query Optimization for XQuery over XML Streams,” Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), pp. 277-288, 2005.
[23] L.S. Colby, “A Recursive Algebra and Query Optimization for Nested Relations,” Proc. ACM SIGMOD '89, pp. 273-283, 1989.
49 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool