Issue No.07 - July (2005 vol.17)
David Wai-lok Cheung , IEEE Computer Society
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2005.110
Queries on semistructured data are hard to process due to the complex nature of the data and call for specialized techniques. Existing path-based indexes and query processing algorithms are not efficient for searching complex structures beyond simple paths, even when the queries are high-selective. We introduce the definition of minimal infrequent structures (MIS), which are structures that 1) exist in the data, 2) are not frequent with respect to a support threshold, and 3) all substructures of them are frequent. By indexing the occurrences of MIS, we can efficiently locate the high-selective substructures of a query, improving search performance significantly. An efficient data mining algorithm is proposed, which finds the minimal infrequent structures. Their occurrences in the XML data are then indexed by a lightweight data structure and used as a fast filter step in query evaluation. We validate the efficiency and applicability of our methods through experimentation on both synthetic and real data.
Index Terms- Query processing, XML/XSL/RDF, mining methods and algorithms, document indexing.
Wang Lian, David Wai-lok Cheung, S.M. Yiu, "Indexing Useful Structural Patterns for XML Query Processing", IEEE Transactions on Knowledge & Data Engineering, vol.17, no. 7, pp. 997-1009, July 2005, doi:10.1109/TKDE.2005.110