The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - Dec. (2012 vol.24)
pp: 2244-2259
Xiaoying Wu , Wuhan University, Wuhan
Stefanos Souldatos , National Technical University of Athens, Athens
Dimitri Theodoratos , New Jersey Institute of Technology, Newark
Theodore Dalamagas , Institute for the Management of Information Systems, Research Center "Athena", Athens
Yannis Vassiliou , National Technical University of Athens, Athens
Timos Sellis , Institute for the Management of Information Systems, Research Center "Athena", Athens
ABSTRACT
XML query languages typically allow the specification of structural patterns using XPath. Usually, these structural patterns are in the form of trees (Tree-Pattern Queries—TPQs). Finding the occurrences of such patterns in an XML tree is a key operation in XML query evaluation. The multiple previous algorithms presented for this operation focus mainly on the evaluation of tree-pattern queries. Recently, requirements for flexible querying of XML data have motivated the consideration of query classes that are more expressive and flexible than TPQs for which efficient nonmain-memory evaluation algorithms are not known. In this paper, we consider a class of queries, called Partial Tree-Pattern Queries (PTPQs), which generalize and strictly contain TPQs. PTPQs represent a broad fragment of XPath which is very useful in practice. In order to process PTPQs, we introduce a set of sound and complete inference rules to characterize structural relationship derivation. We provide necessary and sufficient conditions for detecting query unsatisfiability and node redundancy. We also show that PTPQs can be represented as directed acyclic graphs augmented with the “same-path” constraints. In order to leverage existing efficient evaluation algorithms for less expressive classes of queries, we design two approaches that evaluate a PTPQ by decomposing it into a set of simpler queries: algorithm IndexTPQGen, exploits a structural summary of the XML data and evaluates a PTPQ by generating an equivalent set of TPQs and unioning their answers. Algorithm PartialPathJoin decomposes the PTPQ into partial-path queries, and merge-joins their solutions. We also develop PartialTreeStack, an original polynomial time holistic algorithm for PTPQs. To the best of our knowledge, this is the first algorithm to support the evaluation of such a broad structural fragment of XPath in the inverted lists evaluation model. We provide a theoretical analysis of our algorithm and identify cases where it is asymptotically optimal. An extensive experimental evaluation shows that it is more efficient, robust, and stable than the other two and it outperforms a state-of-the art XQuery engine on PTPQs.
INDEX TERMS
XML, Algorithm design and analysis, Database languages, Query processing, Polynomials, Semantics, Data models, partial tree-pattern query, XML query processing, XPath query evaluation, tree-pattern query
CITATION
Xiaoying Wu, Stefanos Souldatos, Dimitri Theodoratos, Theodore Dalamagas, Yannis Vassiliou, Timos Sellis, "Processing and Evaluating Partial Tree Pattern Queries on XML Data", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 12, pp. 2244-2259, Dec. 2012, doi:10.1109/TKDE.2011.137
REFERENCES
[1] World Wide Web Consortium site, W3C. http:/www.w3.org/, 2012.
[2] S. Al-Khalifa, H.V. Jagadish, J.M. Patel, Y. Wu, N. Koudas, and D. Srivastava, "Structural Joins: A Primitive for Efficient XML Query Pattern Matching," Proc. Int'l Conf. Data Eng. (ICDE), 2002.
[3] S. Amer-Yahia, S. Cho, and D. Srivastava, "Tree Pattern Relaxation," Proc. Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT), 2002.
[4] S. Amer-Yahia, L.V.S. Lakshmanan, and S. Pandit, "Flexpath: Flexible Structure and Full-Text Querying for XML," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2004.
[5] A. Arion, A. Bonifati, I. Manolescu, and A. Pugliese, "Path Summaries and Path Partitioning in Modern XML Databases," World Wide Web, vol. 11, pp. 117-151, 2008.
[6] Z. Bar-Yossef, M. Fontoura, and V. Josifovski, "On the Memory Requirements of XPath Evaluation over XML Streams," Proc. ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), pp. 177-188, 2004.
[7] A. Barta, M.P. Consens, and A.O. Mendelzon, "Benefits of Path Summaries in an XML Query Optimizer Supporting Multiple Access Methods," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 133-144, 2005.
[8] P.A. Boncz, T. Grust, M. van Keulen, S. Manegold, J. Rittinger, and J. Teubner, "Monetdb/xquery: A Fast Xquery Processor Powered by a Relational Engine," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 479-490, 2006.
[9] N. Bruno, N. Koudas, and D. Srivastava, "Holistic Twig Joins: Optimal XML Pattern Matching," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2002.
[10] L. Chen, A. Gupta, and M.E. Kurul, "Stack-Based Algorithms for Pattern Matching on DAGs," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2005.
[11] S. Chen, H.-G. Li, J. Tatemura, W.-P. Hsiung, D. Agrawal, and K.S. Candan, "Twig2Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2006.
[12] T. Chen and C.-Y. Chan, "Viewjoin: Efficient View-Based Evaluation of Tree Pattern Queries," Proc. Int'l Conf. Data Eng. (ICDE), 2010.
[13] T. Chen, J. Lu, and T.W. Ling, "On Boosting Holism in XML Twig Pattern Matching Using Structural Indexing Techniques," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2005.
[14] S.-Y. Chien, Z. Vagena, D. Zhang, V.J. Tsotras, and C. Zaniolo, "Efficient Structural Joins on Indexed XML Documents," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2002.
[15] S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv, "XSEarch: A Semantic Search Engine for XML," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2003.
[16] D. Florescu, D. Kossmann, and I. Manolescu, "Integrating Keyword Search into XML Query Processing," Computer Networks, vol. 33, pp. 119-135, 2000.
[17] M. Fontoura, V. Josifovski, E. Shekita, and B. Yang, "Optimizing Cursor Movement in Holistic Twig Joins," Proc. ACM Int'l Conf. Information and Knowledge Management (CIKM), 2005.
[18] H. Georgiadis and V. Vassalos, "Xpath on Steroids: Exploiting Relational Engines for Xpath Performance," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), pp. 317-328, 2007.
[19] G. Gottlob, C. Koch, and R. Pichler, "Efficient Algorithms for Processing Xpath Queries," ACM Trans. Database Systems, vol. 30, no. 2, pp. 444-491, 2005.
[20] G. Gottlob, C. Koch, and K.U. Schulz, "Conjunctive Queries over Trees," Proc. ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), pp. 189-200, 2004.
[21] G. Gou and R. Chirkova, "Efficient Algorithms for Evaluating Xpath over Streams," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), pp. 269-280, 2007.
[22] G. Gou and R. Chirkova, "Efficiently Querying Large XML Data Repositories: A Survey," IEEE Trans. Knowledge Data Eng., vol. 19, no. 10, pp. 1381-1403, Oct. 2007.
[23] W.-S. Han, H. Jiang, H. Ho, and Q. Li, "Streamtx: Extracting Tuples from Streaming XML Data," VLDB Endowment, vol. 1, no. 1, pp. 289-300, 2008.
[24] V. Hristidis, Y. Papakonstantinou, and A. Balmin, "Keyword Proximity Search on XML Graphs," Proc. Int'l Conf. Data Eng. (ICDE), pp. 367-378, 2003.
[25] H. Jiang, H. Lu, and W. Wang, "Efficient Processing of XML Twig Queries with Or-Predicates," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2004.
[26] H. Jiang, H. Lu, W. Wang, and B.C. Ooi, "XR-Tree: Indexing XML Data for Efficient Structural Joins," Proc. Int'l Conf. Data Eng. (ICDE), 2003.
[27] H. Jiang, W. Wang, H. Lu, and J.X. Yu, "Holistic Twig Joins on Indexed XML Documents," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2003.
[28] Y. Kanza and Y. Sagiv, "Flexible Queries over Semistructured Data," Proc. ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), 2001.
[29] R. Kaushik, P. Bohannon, J.F. Naughton, and H.F. Korth, "Covering Indexes for Branching Path Queries," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), pp. 133-144, 2002.
[30] R. Kaushik, R. Krishnamurthy, J.F. Naughton, and R. Ramakrishnan, "On the Integration of Structure Indexes and Inverted Lists," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), pp. 779-790, 2004.
[31] H. Li, M.-L. Lee, W. Hsu, and C. Chen, "An Evaluation of Xml Indexes for Structural Join," SIGMOD Record, vol. 33, no. 3, pp. 28-33, 2004.
[32] Y. Li, C. Yu, and H.V. Jagadish, "Schema-Free XQuery," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 72-83, 2004.
[33] J. Lu, T. Chen, and T.W. Ling, "Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-Ahead Approach," Proc. ACM Int'l Conf. Information and Knowledge Management (CIKM), 2004.
[34] J. Lu, T.W. Ling, C.-Y. Chan, and T. Chen, "From Region Encoding to Extended Dewey: On Efficient Processing of XML Twig Pattern Matching," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2005.
[35] T. Milo and D. Suciu, "Index Structures for Path Expressions," Proc. Int'l Conf. Database Theory (ICDT), pp. 277-295, 1999.
[36] M.M. Moro, Z. Vagena, and V.J. Tsotras, "Tree-Pattern Queries on a Lightweight XML Processor," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 205-216, 2005.
[37] D. Olteanu, "Forward Node-Selecting Queries over Trees," ACM Trans. Database Systems, vol. 32, no. 1,article 3, 2007.
[38] D. Olteanu, "Spex: Streamed and Progressive Evaluation of XPath," IEEE Trans. Knowledge and Data Eng., vol. 19, no. 7, pp. 934-949, July 2007.
[39] F. Peng and S.S. Chawathe, "XPath Queries on Streaming Data," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), pp. 431-442, 2003.
[40] D. Phillips, N. Zhang, I.F. Ilyas, and M.T. Özsu, "Interjoin: Exploiting Indexes and Materialized Views in XPath Evaluation," Proc. Int'l Conf. Scientific and Statistical Database Management (SSDBM), pp. 13-22, 2006.
[41] P. Rao and B. Moon, "Prix: Indexing and Querying XML Using Prüfer Sequences," Proc. Int'l Conf. Data Eng. (ICDE), pp. 288-300, 2004.
[42] S. Souldatos, X. Wu, D. Theodoratos, T. Dalamagas, and T.K. Sellis, "Evaluation of Partial Path Queries on XML Data," Proc. ACM Conf. Conf. Information and Knowledge Management (CIKM), pp. 21-30, 2007.
[43] D. Theodoratos, T. Dalamagas, A. Koufopoulos, and N. Gehani, "Semantic Querying of Tree-Structured Data Sources Using Partially Specified Tree Patterns," Proc. ACM Conf. Conf. Information and Knowledge Management (CIKM), 2005.
[44] D. Theodoratos, P. Placek, T. Dalamagas, S. Souldatos, and T.K. Sellis, "Containment of Partially Specified Tree-pattern Queries in the Presence of Dimension Graphs," VLDB J., vol. 18, no. 1, pp. 233-254, 2009.
[45] D. Theodoratos and X. Wu, "Assigning Semantics to Partial Tree-Pattern Queries," Data Knowledge Eng., vol. 64, no. 1, pp. 242-265, 2008.
[46] H. Wang, J. Li, J. Luo, and H. Gao, "Hash-Base Subgraph Query Processing Method for Graph-Structured XML Documents," VLDB Endowment, vol. 1, no. 1, pp. 478-489, 2008.
[47] X. Wu, S. Souldatos, D. Theodoratos, T. Dalamagas, and T.K. Sellis, "Efficient Evaluation of Generalized Path Pattern Queries on XML Data," Proc. Int'l Conf. World Wide Web (WWW), pp. 835-844, 2008.
[48] X. Wu, D. Theodoratos, S. Souldatos, T. Dalamagas, and T.K. Sellis, "Efficient Evaluation of Generalized Tree-Pattern Queries with Same-Path Constraints," Proc. Int'l Conf. Scientific and Statistical Database Management (SSDBM), pp. 361-379, 2009.
[49] X. Wu, D. Theodoratos, and W.H. Wang, "Answering XML Queries Using Materialized Views Revisited," Proc. ACM Conf. Information and Knowledge Management (CIKM), pp. 475-484, 2009.
[50] Y. Wu, J.M. Patel, and H.V. Jagadish, "Structural Join Order Selection for XML Query Optimization," Proc. Int'l Conf. Data Eng. (ICDE), 2003.
[51] B. Yang, M. Fontoura, E. Shekita, S. Rajagopalan, and K. Beyer, "Virtual Cursors for XML Joins," Proc. ACM Int'l Conf. Information and Knowledge Management (CIKM), 2004.
[52] C. Yu and H.V. Jagadish, "Querying Complex Structured Databases," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 1010-1021, 2007.
[53] C. Zhang, J. Naughton, D. DeWitt, Q. Luo, and G. Lohman, "On Supporting Containment Queries in Relational Database Management Systems," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2001.
18 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool