Issue No.03 - March (2011 vol.23)
Tok Wang Ling , National University of Singapore, Singapore
Zhifeng Bao , National University of Singapore, Singapore
Chen Wang , IBM China Research Lab, Beijing
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.126
As business and enterprises generate and exchange XML data more often, there is an increasing need for efficient processing of queries on XML data. Searching for the occurrences of a tree pattern query in an XML database is a core operation in XML query processing. Prior works demonstrate that holistic twig pattern matching algorithm is an efficient technique to answer an XML tree pattern with parent-child (P-C) and ancestor-descendant (A-D) relationships, as it can effectively control the size of intermediate results during query processing. However, XML query languages (e.g., XPath and XQuery) define more axes and functions such as negation function, order-based axis, and wildcards. In this paper, we research a large set of XML tree pattern, called extended XML tree pattern, which may include P-C, A-D relationships, negation functions, wildcards, and order restriction. We establish a theoretical framework about “matching cross” which demonstrates the intrinsic reason in the proof of optimality on holistic algorithms. Based on our theorems, we propose a set of novel algorithms to efficiently process three categories of extended XML tree patterns. A set of experimental results on both real-life and synthetic data sets demonstrate the effectiveness and efficiency of our proposed theories and algorithms.
Query processing, XML/XSL/RDF, algorithms, tree pattern.
Tok Wang Ling, Zhifeng Bao, Chen Wang, "Extended XML Tree Pattern Matching: Theories and Algorithms", IEEE Transactions on Knowledge & Data Engineering, vol.23, no. 3, pp. 402-416, March 2011, doi:10.1109/TKDE.2010.126