The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.01 - Jan. (2013 vol.25)
pp: 29-46
Marouane Hachicha , Université de Lyon (ERIC Lyon 2), Bron
Jérôme Darmont , Université de Lyon (ERIC Lyon 2), Bron
ABSTRACT
With XML becoming a ubiquitous language for data interoperability purposes in various domains, efficiently querying XML data is a critical issue. This has lead to the design of algebraic frameworks based on tree-shaped patterns akin to the tree-structured data model of XML. Tree patterns are graphic representations of queries over data trees. They are actually matched against an input data tree to answer a query. Since the turn of the 21st century, an astounding research effort has been focusing on tree pattern models and matching optimization (a primordial issue). This paper is a comprehensive survey of these topics, in which we outline and compare the various features of tree patterns. We also review and discuss the two main families of approaches for optimizing tree pattern matching, namely pattern tree minimization and holistic matching. We finally present actual tree pattern-based developments, to provide a global overview of this significant research topic.
INDEX TERMS
XML, Pattern matching, Vegetation, Algebra, Optimization, Data models, Computers, tree pattern rewriting, XML querying, data tree, tree pattern, tree pattern query, twig pattern, matching, containment, tree pattern minimization, holistic matching, tree pattern mining
CITATION
Marouane Hachicha, Jérôme Darmont, "A Survey of XML Tree Patterns", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 1, pp. 29-46, Jan. 2013, doi:10.1109/TKDE.2011.209
REFERENCES
[1] L. Quin, "Extensible Markup Language (XML)," World Wide Web Consortium (W3C), http://www.w3.orgXML/, 2006.
[2] D. Carlisle, P. Ion, and R. Miner, "Mathematical Markup Language (MathML) Version 3.0," World Wide Web Consortium (W3C), http://www.w3.org/TRMathML/, 2010.
[3] P. Murray-Rust and H. Rzepa, "Chemical Markup Language - CML," http:/www.xml-cml.org/, 1995.
[4] R. Lake, D.S. Burggraf, M. Trninic, and L. Rae, Geography Mark-Up Language: Foundation for the Geo-Web. Wiley, 2004.
[5] ADL, "SCORM 2004 Fourth Edition Version 1.1 Overview," Advanced Distributed Learning (ADL), http://www. adlnet.gov/Technologiesscorm /, 2004.
[6] J. Clark and S. DeRose, "XML Path Language (XPath) Version 1.0," World Wide Web Consortium (W3C), http://www.w3.org/TRxpath, 1999.
[7] S. Boag, D. Chamberlin, M.F. Fernández, D. Florescu, J. Robie, and J. Siméon, "XQuery 1.0: An XML Query Language," World Wide Web Consortium (W3C), http://www.w3.org/TRxquery/, 2007.
[8] H.V. Jagadish, L.V.S. Lakshmanan, D. Srivastava, and K. Thompson, "TAX: A Tree Algebra for XML," Proc. Eighth Int'l Workshop Database Programming Languages (DBPL '01), pp. 149-164, 2001.
[9] A. Trotman, N. Pharo, and M. Lehtonen, "XML-IR Users and Use Cases," Proc. Fifth Int'l Workshop of the Initiative for the Evaluation of XML Retrieval (INEX '06), pp. 400-412, 2006.
[10] P. Michiels, G.A. Mihaila, and J. Siméon, "Put a Tree Pattern in Your Algebra," Proc. 23rd Int'l Conf. Data Eng. (ICDE '07), pp. 246-255, 2007.
[11] A. Deutsch, M.F. Fernández, and D. Suciu, "Storing Semistructured Data with STORED," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '99), pp. 431-442, 1999.
[12] J. Shanmugasundaram, K. Tufte, C. Zhang, G. He, D.J. DeWitt, and J.F. Naughton, "Relational Databases for Querying XML Documents: Limitations and Opportunities," Proc. 25th Int'l Conf. Very Large Data Bases (VLDB '99), pp. 302-314, 1999.
[13] S. Paparizos and H.V. Jagadish, "Pattern Tree Algebras: Sets or Sequences?" Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), pp. 349-360, 2005.
[14] S. Paparizos and H. Jagadish, "The Importance of Algebra for XML Query Processing," Proc. Second Int'l Workshop Database Technologies for Handling XML Information on the Web (DataX '06), pp. 126-135, 2006.
[15] D.D. Chamberlin and R.F. Boyce, "SEQUEL: A Structured English Query Language," Proc. ACM-SIGMOD Workshop Data Description, Access and Control (SIGMOD), vol. 1, pp. 249-264, 1974.
[16] S. Amer-Yahia, S. Cho, L.V.S. Lakshmanan, and D. Srivastava, "Minimization of Tree Pattern Queries," Proc. ACM SIGMOD 20th Int'l Conf. Management of Data (SIGMOD '01), pp. 497-508, 2001.
[17] G. Gou and R. Chirkova, "Efficiently Querying Large XML Data Repositories: A Survey," IEEE Trans. Knowledge and Data Eng., vol. 19, no. 10, pp. 1381-1403, Oct. 2007.
[18] L.V.S. Lakshmanan, "XML Tree Pattern, XML Twig Query," Encyclopedia of Database Systems, pp. 3637-3640, Springer, 2009.
[19] N. Grimsmo and T.A. Bjørklund, "Towards Unifying Advances in Twig Join Algorithms," Proc. 21st Australasian Database Conf. (ADC '10), vol. 104, pp. 57-66, 2010.
[20] S. Flesca, F. Furfaro, and E. Masciari, "On the Minimization of Xpath Queries," Proc. 29th Int'l Conf. Very Large Data Bases (VLDB '03), pp. 153-164, 2003.
[21] Z. Chen, H.V. Jagadish, L.V.S. Lakshmanan, and S. Paparizos, "From Tree Patterns to Generalized Tree Patterns: On Efficient Evaluation of XQuery," Proc. 29th Int'l Conf. Very Large Data Bases (VLDB '03), pp. 237-248, 2003.
[22] L.V.S. Lakshmanan, G. Ramesh, H. Wang, and Z.J. Zhao, "On Testing Satisfiability of Tree Pattern Queries," Proc. 30th Int'l Conf. Very Large Data Bases (VLDB '04), pp. 120-131, 2004.
[23] J. Wang, J.X. Yu, and C. Liu, "Independence of Containing Patterns Property and Its Application in Tree Pattern Query Rewriting Using Views," World Wide Web, vol. 12, no. 1, pp. 87-105, 2009.
[24] D. Beech, A. Malhotra, and M. Rys, "A Formal Data Model and Algebra for XML," technical report, W3C XML Query Working Group Note, 1999.
[25] C. Beeri and Y. Tzaban, "SAL: An Algebra for Semistructured Data and XML," Proc. ACM SIGMOD Workshop Web and Databases (WebDB), pp. 37-42, 1999.
[26] D.D. Chamberlin, J. Robie, and D. Florescu, "Quilt: An XML Query Language for Heterogeneous Data Sources," Proc. Third Int'l Workshop World Wide Web and Databases (WebDB '00), pp. 1-25, 2000.
[27] A. Deutsch, M.F. Fernández, D. Florescu, A. Levy, and D. Suciu, "XML-QL: A Query Language for XML," World Wide Web Consortium (W3C), http://www.w3.org/TRNOTE-xml-ql/, 1998.
[28] H. Ishikawa, K. Kubota, Y. Kanemasa, and Y. Noguchi, "The Design of a Query Language for XML Data," Proc. 10th Int'l DEXA Workshop Database and Expert Systems Applications, 1999.
[29] G. Mecca, P. Merialdo, and P. Atzeni, "Do We Really Need a New Query Language for XML?" Proc. First W3C Query Languages Workshop (QL '98), 1998.
[30] N. Shinagawa, H. Kitagawa, and Y. Ishikawa, "${\rm X}^{2}{\rm QL}$ : An eXtensible XML Query Language Supporting User-Defined Foreign Functions," Proc. ADBIS-DASFAA Symp. Advances in Databases and Information Systems (ADBIS-DASFAA '00), pp. 251-264, 2000.
[31] S. Paparizos, Y. Wu, L.V.S. Lakshmanan, and H.V. Jagadish, "Tree Logical Classes for Efficient Evaluation of XQuery," Proc. SIGMOD 23rd Int'l Conf. Management of Data (SIGMOD '04), pp. 71-82, 2004.
[32] Y. Chen, "Discovering Ordered Tree Patterns from XML Queries," Proc. Eighth Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD '04), pp. 559-563, 2004.
[33] B. Kimelfeld and Y. Sagiv, "Twig Patterns: From XML Trees to Graphs," Proc. Ninth Int'l Workshop Web and Databases (WebDB '06), 2006.
[34] S.K. Izadi, T. Härder, and M.S. Haghjoo, "${\rm S}^{3}$ : Evaluation of Tree-Pattern Queries Supported by Structural Summaries," Data and Knowledge Eng., vol. 68, no. 1, pp. 126-145, 2009.
[35] G. Miklau and D. Suciu, "Containment and Equivalence for an XPath Fragment," Proc. ACM SIGACT-SIGMOD-SIGART 21st Symp. Principles of Database Systems (PODS '02), pp. 65-76, 2002.
[36] J. Lu, T.W. Ling, Z. Bao, and C. Wang, "Extended XML Tree Pattern Matching: Theories and Algorithms," IEEE Trans. Knowledge and Data Eng., vol. 23, no. 3, pp. 402-416, Mar. 2011.
[37] S. Amer-Yahia, S. Cho, and D. Srivastava, "Tree Pattern Relaxation," Proc. Eighth Int'l Conf. Extending Database Technology (EDBT '02), pp. 496-513, 2002.
[38] S. Paparizos, S. Al-Khalifa, A. Chapman, H.V. Jagadish, L.V.S. Lakshmanan, A. Nierman, J.M. Patel, D. Srivastava, N. Wiwatwattana, Y. Wu, and C. Yu, "TIMBER: A Native System for Querying XML," Proc. ACM SIGMOD 22th Int'l Conf. Management of Data (SIGMOD '03), p. 672, 2003.
[39] H. Katz, "XQEngine at SourceForge," Fatdog Software, Inc., http:/xqengine.sourceforge.net/, 2005.
[40] J. Hidders, "Satisfiability of XPath Expressions," Proc. Ninth Int'l Workshop Database Programming Languages (DBPL '03), pp. 21-36, 2004.
[41] C. David, "Complexity of Data Tree Patterns over XML Documents," Proc. 33rd Int'l Symp. Math. Foundations of Computer Science (MFCS '08), pp. 278-289, 2008.
[42] M. Benedikt, W. Fan, and F. Geerts, "XPath Satisfiability in the Presence of DTDs," J. ACM, vol. 55, no. 2, pp. 273-305, 2008.
[43] P. Ramanan, "Efficient Algorithms for Minimizing Tree Pattern Queries," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '02), pp. 299-309, 2002.
[44] D. Chen and C.Y. Chan, "Minimization of Tree Pattern Queries with Constraints," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '08), pp. 609-622, 2008.
[45] A.K. Chandra and P.M. Merlin, "Optimal Implementation of Conjunctive Queries in Relational Data Bases," Proc. Ninth Ann. ACM Symp. Theory of Computing (STOC '77), pp. 77-90, 1977.
[46] F. Neven and T. Schwentick, "XPath Containment in the Presence of Disjunction, DTDs, and Variables," Proc. Ninth Int'l Conf. Database Theory (ICDT '03), pp. 312-326, 2003.
[47] P.T. Wood, "Containment for XPath Fragments under DTD Constraints," Proc. Ninth Int'l Conf. Database Theory (ICDT '03), pp. 297-311, 2003.
[48] Y. Chen and D. Che, "Efficient Processing of XML Tree Pattern Queries," J. Advanced Computational Intelligence and Intelligent Informatics, vol. 10, no. 5, pp. 738-743, 2006.
[49] D. Che and Y. Liu, "Efficient Minimization of XML Tree Pattern Queries," Proc. First Int'l Conf. Next Generation Web Services Practices (NWeSP '05), 2005.
[50] S. Abiteboul and V. Vianu, "Queries and Computation on the Web," Theoretical Computer Science, vol. 239, no. 2, pp. 231-255, 2000.
[51] C.Y. Chan, W. Fan, P. Felber, M.N. Garofalakis, and R. Rastogi, "Tree Pattern Aggregation for Scalable XML Data Dissemination," Proc. 28th Int'l Conf. Very Large Data Bases (VLDB '02), pp. 826-837, 2002.
[52] J.D. Ullman, Principles of Database and Knowledge-Base Systems, vol. 1. Computer Science Press, 1988.
[53] Y. Chen and D. Che, "Minimization of XML Tree Pattern Queries in the Presence of Integrity Constraints," J. Advanced Computational Intelligence and Intelligent Informatics, vol. 10, no. 5, pp. 744-751, 2006.
[54] S. Al-Khalifa, H.V. Jagadish, J.M. Patel, Y. Wu, N. Koudas, and D. Srivastava, "Structural Joins: A Primitive for Efficient XML Query Pattern Matching," Proc. 18th Int'l Conf. Data Eng. (ICDE '02), p. 141, 2002.
[55] J. Lu, "Benchmarking Holistic Approaches to XML Tree Pattern Query Processing - (Extended Abstract of Invited Talk)," Proc. 15th Int'l Conf. Database Systems for Advanced Applications (DASFAA '10), pp. 170-178, 2010.
[56] Q. Li and B. Moon, "Indexing and Querying XML Data for Regular Path Expressions," Proc. 27th Int'l Conf. Very Large Data Bases (VLDB '01), pp. 361-370, 2001.
[57] X. Wu, M.-L. Lee, and W. Hsu, "A Prime Number Labeling Scheme for Dynamic Ordered XML Trees," Proc. 20th Int'l Conf. Data Eng. (ICDE '04), pp. 66-78, 2004.
[58] R. Kaushik, R. Krishnamurthy, J.F. Naughton, and R. Ramakrishnan, "On the Integration of Structure Indexes and Inverted Lists," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '04), pp. 779-790, 2004.
[59] N. Bruno, N. Koudas, and D. Srivastava, "Holistic Twig Joins: Optimal XML Pattern Matching," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '02), pp. 310-321, 2002.
[60] I. Tatarinov, S. Viglas, K.S. Beyer, J. Shanmugasundaram, E.J. Shekita, and C. Zhang, "Storing and Querying Ordered XML Using a Relational Database System," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '02), pp. 204-215, 2002.
[61] J. Lu, T.W. Ling, C.Y. Chan, and T. Chen, "From Region Encoding to Extended Dewey: On Efficient Processing of XML Twig Pattern Matching," Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), pp. 193-204, 2005.
[62] P.E. O$^\prime$ Neil, E.J. O$^\prime$ Neil, S. Pal, I. Cseri, G. Schaller, and N. Westbury, "ORDPATHs: Insert-Friendly XML Node Labels," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '04), pp. 903-908, 2004.
[63] L. Xu, T.W. Ling, H. Wu, and Z. Bao, "DDE: From Dewey to a Fully Dynamic XML Labeling Scheme," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '09), pp. 719-730, 2009.
[64] J. Lu, "Efficient Processing of XML Twig Pattern Matching," PhD dissertation, Nat'l Univ. of Singapore, 2006.
[65] A. Barta, M.P. Consens, and A.O. Mendelzon, "Benefits of Path Summaries in an XML Query Optimizer Supporting Multiple Access Methods," Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), pp. 133-144, 2005.
[66] J. Lu, T. Chen, and T.W. Ling, "Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-Ahead Approach," Proc. ACM CIKM Int'l Conf. Information and Knowledge Management (CIKM '04), pp. 533-542, 2004.
[67] J. Lu, T.W. Ling, T. Yu, C. Li, and W. Ni, "Efficient Processing of Ordered XML Twig Pattern," Proc. 16th Int'l Workshop Database and Expert Systems Applications (DEXA '05), pp. 300-309, 2005.
[68] T. Chen, J. Lu, and T.W. Ling, "On Boosting Holism in XML Twig Pattern Matching Using Structural Indexing Techniques," Proc. ACM SIGMOD 24th Int'l Conf. Management of Data (SIGMOD '05), pp. 455-466, 2005.
[69] S. Chen, H.-G. Li, J. Tatemura, W.-P. Hsiung, D. Agrawal, and K.S. Candan, "${\rm Twig}^{2}{\rm stack}$ : Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB '06), 2006.
[70] R. Baca, M. Krátký, and V. Snásel, "On the Efficient Search of an XML Twig Query in Large DataGuide Trees," Proc. 12th Int'l Database Eng. and Applications Symp. (IDEAS '08), pp. 149-158, 2008.
[71] R. Goldman and J. Widom, "DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases," Proc. 23rd Int'l Conf. Very Large Data Bases, pp. 436-445, 1997.
[72] S.-C. Haw and C.-S. Lee, "TwigX-Guide: An Efficient Twig Pattern Matching System Extending DataGuide Indexing and Region Encoding Labeling," J. Information Science and Eng., vol. 25, no. 2, pp. 603-617, 2009.
[73] X. Wu and G. Liu, "XML Twig Pattern Matching Using Version Tree," Data and Knowledge Eng., vol. 64, no. 3, pp. 580-599, 2008.
[74] M. Götz, C. Koch, and W. Martens, "Efficient Algorithms for Descendant-Only Tree Pattern Queries," Information Systems, vol. 34, no. 7, pp. 602-623, 2009.
[75] R. Baca, "Path-Based Approaches to the Twig Pattern Query Searching," PhD dissertation, VSB-Technical Univ. of Ostrava, Czech Republic, 2008.
[76] J. Yao and M. ZhangII, "A Fast Tree Pattern Matching Algorithm for XML Query," Proc. IEEE/WIC/ACM Int'l Conf. Web Intelligence (WI '04), pp. 235-241, 2004.
[77] L.H. Yang, M.-L. Lee, and W. Hsu, "Efficient Mining of XML Query Patterns for Caching," Proc. 29th Int'l Conf. Very Large Data Bases (VLDB '03), pp. 69-80, 2003.
[78] J. Wang, K. Wang, and J. Li, "Finding Irredundant Contained Rewritings of Tree Pattern Queries Using Views," Proc. Joint Int'l Conf. Advances in Data and Web Management, (APWeb/WAIM '09), pp. 113-125, 2009.
[79] I. Tatarinov and A.Y. Halevy, "Efficient Query Reformulation in Peer-Data Management Systems," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '04), pp. 539-550, 2004.
[80] J. Zhang, T.W. Ling, R.M. Bruckner, and A.M. Tjoa, "Building XML Data Warehouse Based on Frequent Patterns in User Queries," Proc. Fifth Int'l Conf. Data Warehousing and Knowledge Discovery (DaWaK '03), pp. 99-108, 2003.
[81] H.-T. Ma, Z.-X. Hao, and Y. Zhu, "Checking Satisfiability of Tree Pattern Queries for Active XML Documents," INFOCOMP J. Computer Science, vol. 7, no. 1, pp. 11-18, 2008.
[82] S. Abiteboul, P. Bourhis, and B. Marinoiu, "Satisfiability and Relevance for Queries Over Active Documents," Proc. 28th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS '09), pp. 87-96, 2009.
[83] R. Chand, P. Felber, and M.N. Garofalakis, "Tree-Pattern Similarity Estimation for Scalable Content-Based Routing," Proc. 23rd Int'l Conf. Data Eng. (ICDE '07), pp. 1016-1025, 2007.
[84] M.C.-E. Hsieh, Y.-H. Wu, and A.L.P. Chen, "Discovering Frequent Tree Patterns over Data Streams," Proc. Sixth SIAM Int'l Conf. Data Mining (SDM '06), 2006.
[85] L.H. Yang, M.-L. Lee, W. Hsu, and S. Acharya, "Mining Frequent Query Patterns from XML Queries," Proc. Eighth Int'l Conf. Database Systems for Advanced Applications (DASFAA '03), pp. 355-362, 2003.
[86] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules in Large Databases," Proc. 20th Int'l Conf. Very Large Data Bases (VLDB '94), pp. 487-499, 1994.
[87] A. Balmin, F. Özcan, K.S. Beyer, R. Cochrane, and H. Pirahesh, "A Framework for Using Materialized XPath Views in XML Query Processing," Proc. 30th Int'l Conf. Very Large Data Bases (VLDB '04), pp. 60-71, 2004.
[88] L.V.S. Lakshmanan, W.H. Wang, and Z.J. Zhao, "Answering Tree Pattern Queries Using Views," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB '06), pp. 571-582, 2006.
[89] S. Abiteboul, O. Benjelloun, I. Manolescu, T. Milo, and R. Weber, "Active XML: Peer-to-Peer Data and Web Services Integration," Proc. 28th Int'l Conf. Very Large Data Bases (VLDB '02), pp. 1087-1090, 2002.
[90] D. Theodoratos and X. Wu, "Eager Evaluation of Partial Tree-Pattern Queries on XML Streams," Proc. 14th Int'l Conf. Database Systems for Advanced Applications (DASFAA '09), pp. 241-246, 2009.
[91] P. Placek, D. Theodoratos, S. Souldatos, T. Dalamagas, and T.K. Sellis, "A Heuristic Approach for Checking Containment of Generalized Tree-Pattern Queries," Proc. 17th ACM Conf. Information and Knowledge Management (CIKM '08), pp. 551-560, 2008.
[92] X. Wu, D. Theodoratos, S. Souldatos, T. Dalamagas, and T.K. Sellis, "Efficient Evaluation of Generalized Tree-Pattern Queries with Same-Path Constraints," Proc. 21st Int'l Conf. Scientific and Statistical Database Management (SSDBM '09), pp. 361-379, 2009.
17 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool