|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
20th International Conference on Data Engineering (ICDE'04)
Selectivity Estimation for XML Twigs
Boston, Massachusetts
March 30-April 02
ISBN: 0-7695-2065-0
| ASCII Text | x | ||
| Neoklis Polyzotis, Minos Garofalakis, Yannis Ioannidis, "Selectivity Estimation for XML Twigs," Data Engineering, International Conference on, pp. 264, 20th International Conference on Data Engineering (ICDE'04), 2004. | |||
| BibTex | x | ||
| @article{ 10.1109/ICDE.2004.1320003, author = {Neoklis Polyzotis and Minos Garofalakis and Yannis Ioannidis}, title = {Selectivity Estimation for XML Twigs}, journal ={Data Engineering, International Conference on}, volume = {0}, year = {2004}, issn = {1063-6382}, pages = {264}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICDE.2004.1320003}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Data Engineering, International Conference on TI - Selectivity Estimation for XML Twigs SN - 1063-6382 SP EP A1 - Neoklis Polyzotis, A1 - Minos Garofalakis, A1 - Yannis Ioannidis, PY - 2004 KW - null VL - 0 JA - Data Engineering, International Conference on ER - | |||
Twig queries represent the building blocks of declarative query languages over XML data. A twig query describes a complex traversal of the document graph and generates a set of element tuples based on the intertwined evaluation (i.e., join) of multiple path expressions. Estimating the result cardinality of twig queries or, equivalently, the number of tuples in such a structural (path-based) join, is a fundamental problem that arises in the optimization of declarative queries over XML. It is crucial, therefore, to develop concise synopsis structures that summarize the document graph and enable such selectivity estimates within the time and space constraints of the optimizer. In this paper, we propose novel summarization and estimation techniques for estimating the selectivity of twig queries with complex XPath expressions over tree-structured data. Our approach is based on the XSKETCH model, augmented with new types of distribution information for capturing complex correlation patterns across structural joins. Briefly, the key idea is to represent joins as points in a multidimensional space of path counts that capture aggregate information on the contents of the resulting element tuples. We develop a systematic framework that combines distribution information with appropriate statistical assumptions in order to provide selectivity estimates for twig queries over concise XS-KETCH synopses and we describe an efficient algorithm for constructing an accurate summary for a given space budget. Implementation results with both synthetic and real-life data sets verify the effectiveness of our approach and demonstrate its benefits over earlier techniques.
Citation:
Neoklis Polyzotis, Minos Garofalakis, Yannis Ioannidis, "Selectivity Estimation for XML Twigs," icde, pp.264, 20th International Conference on Data Engineering (ICDE'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.
