The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2011 vol.23)
pp: 402-416
Jiaheng Lu , Renmin University of China, Beijing
Tok Wang Ling , National University of Singapore, Singapore
Zhifeng Bao , National University of Singapore, Singapore
Chen Wang , IBM China Research Lab, Beijing
ABSTRACT
As business and enterprises generate and exchange XML data more often, there is an increasing need for efficient processing of queries on XML data. Searching for the occurrences of a tree pattern query in an XML database is a core operation in XML query processing. Prior works demonstrate that holistic twig pattern matching algorithm is an efficient technique to answer an XML tree pattern with parent-child (P-C) and ancestor-descendant (A-D) relationships, as it can effectively control the size of intermediate results during query processing. However, XML query languages (e.g., XPath and XQuery) define more axes and functions such as negation function, order-based axis, and wildcards. In this paper, we research a large set of XML tree pattern, called extended XML tree pattern, which may include P-C, A-D relationships, negation functions, wildcards, and order restriction. We establish a theoretical framework about “matching cross” which demonstrates the intrinsic reason in the proof of optimality on holistic algorithms. Based on our theorems, we propose a set of novel algorithms to efficiently process three categories of extended XML tree patterns. A set of experimental results on both real-life and synthetic data sets demonstrate the effectiveness and efficiency of our proposed theories and algorithms.
INDEX TERMS
Query processing, XML/XSL/RDF, algorithms, tree pattern.
CITATION
Jiaheng Lu, Tok Wang Ling, Zhifeng Bao, Chen Wang, "Extended XML Tree Pattern Matching: Theories and Algorithms", IEEE Transactions on Knowledge & Data Engineering, vol.23, no. 3, pp. 402-416, March 2011, doi:10.1109/TKDE.2010.126
REFERENCES
[1] S. Al-Khalifa, H.V. Jagadish, J.M. Patel, Y. Wu, N. Koudas, and D. Srivastava, "Structural Joins: A Primitive for Efficient XML Query Pattern Matching," Proc. 18th Int'l Conf. Data Eng. (ICDE), pp. 141-152, 2002.
[2] A. Berglund, S. Boag, and D. Chamberlin, XML Path Language (XPath) 2.0, W3C Recommendation, http://www.w3.org/TRxpath20/, Jan. 2007.
[3] N. Bruno, D. Srivastava, and N. Koudas, "Holistic Twig Joins: Optimal XML Pattern Matching," Proc. ACM SIGMOD, pp. 310-321, 2002.
[4] C.Y. Chan, W. Fan, and Y. Zeng, "Taming Xpath Queries by Minimizing Wildcard Steps," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 156-167, 2004.
[5] S. Chen, H.-G. Li, J. Tatemura, W.-P. Hsiung, D. Agrawal, and K.S. Candan, "Twig2stack: Bottom-Up Processing of Generalized-Tree-Pattern Queries over XML Document," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 19-30, 2006.
[6] T. Chen, J. Lu, and T.W. Ling, "On Boosting Holism in XML Twig Pattern Matching Using Structural Indexing Techniques," Proc. ACM SIGMOD, pp. 455-466, 2005.
[7] S. Chien, Z. Vagena, D. Zhang, V.J. Tsotras, and C. Zaniolo, "Efficient Structural Joins on Indexed XML Documents," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 263-274, 2002.
[8] B. Choi, M. Mahoui, and D. Wood, "On the Optimality of the Holistic Twig Join Algorithms," Proc. 21st Int'l Conf. Database and Expert Systems Applications (DEXA), pp. 28-37, 2003.
[9] R. Goldman and J. Widom, "Dataguides: Enabling Query Formulation and Optimization in Semistructured Databases," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 436-445, 1997.
[10] H.V. Jagadish and S. AL-Khalifa, "Timber: A Native XML Database," technical report, Univ. of Michigan, 2002.
[11] H. Jiang et al., "Holistic Twig Joins on Indexed XML Documents," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 273-284, 2003.
[12] H. Jiang, H. Lu, and W. Wang, "Efficient Processing of XML Twig Queries with OR-Predicates," Proc. ACM SIGMOD, pp. 274-285, 2004.
[13] Q. Li and B. Moon, "Indexing and Querying XML Data for Regular Path Expressions," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 361-370, 2001.
[14] J. Lu, T. Chen, and T.W. Ling, "Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-Ahead Approach," Proc. 13th ACM Int'l Conf. Information and Knowledge Management (CIKM), pp. 533-542, 2004.
[15] J. Lu, T.W. Ling, Z. Bao, and C. Wang, "Extended XML Tree Pattern Matching: Theories and Algorithms," technical report, 2010.
[16] J. Lu, T.W. Ling, C. Chany, and T. Chen, "From Region Encoding to Extended Dewey: On Efficient Processing of XML Twig Pattern Matching," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 193-204, 2005.
[17] J. Lu, T.W. Ling, T. Yu, C. Li, and W. Ni, "Efficient Processing of Ordered XML Twig Pattern Matching," Proc. 16th Int'l Conf. Database and Expert Systems Applications (DEXA), pp. 300-309, 2005.
[18] M. Moro, Z. Vagena, and V.J. Tsotras, "Tree-Pattern Queries on a Lightweight XML Processor," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 205-216, 2005.
[19] P. ONeil, E. O'Neil, S. Pal, I. Cseri, G. Schaller, and N. Westbury, "ORDPATHs: Insert-Friendly XML Node Labels," Proc. ACM SIGMOD, pp. 903-908, 2004.
[20] P. Rao and B. Moon, "PRIX: Indexing and Querying XML Using Prufer Sequences," Proc. 20th Int'l Conf. Data Eng. (ICDE), pp. 288-300, 2004.
[21] M. Shalem and Z. Bar-Yossef, "The Space Complexity of Processing XML Twig Queries over Indexed Documents," Proc. 24th Int'l Conf. Data Eng. (ICDE), 2008.
[22] I. Tatarinov, S. Viglas, K.S. Beyer, J. Shanmugasundaram, E.J. Shekita, and C. Zhang, "Storing and Querying Ordered XML Using a Relational Database System," Proc. ACM SIGMOD, pp. 204-215, 2002.
[23] H. Wang and X. Meng, "On the Sequencing of Tree Structures for XML Indexing," Proc. 21st Int'l Conf. Data Eng. (ICDE), pp. 372-383, 2005.
[24] H. Wang, S. Park, W. Fan, and P.S. Yu, "ViST: A Dynamic Index Method for Querying XML Data by Tree Structures," Proc. ACM SIGMOD, pp. 110-121, 2003.
[25] W. Wang, H. Wang, H. Lu, H. Jiang, X. Lin, and J. Li, "Efficient Processing of XML Path Queries Using the Disk-Based F&B Index," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 145-156, 2005.
[26] T. Yu, T.W. Ling, and J. Lu, "Twigstacklistnot: A Holistic Twig Join Algorithm for Twig Query with NOT-Predicates on XML Data," Proc. Database Systems for Advanced Applications (DASFAA), pp. 249-263, 2006.
[27] C. Zhang, J.F. Naughton, D.J. DeWitt, Q. Luo, and G.M. Lohman, "On Supporting Containment Queries in Relational Database Management Systems," Proc. ACM SIGMOD, pp. 425-436, 2001.
21 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool