The Community for Technology Leaders
Green Image
ABSTRACT
Queries on semistructured data are hard to process due to the complex nature of the data and call for specialized techniques. Existing path-based indexes and query processing algorithms are not efficient for searching complex structures beyond simple paths, even when the queries are high-selective. We introduce the definition of minimal infrequent structures (MIS), which are structures that 1) exist in the data, 2) are not frequent with respect to a support threshold, and 3) all substructures of them are frequent. By indexing the occurrences of MIS, we can efficiently locate the high-selective substructures of a query, improving search performance significantly. An efficient data mining algorithm is proposed, which finds the minimal infrequent structures. Their occurrences in the XML data are then indexed by a lightweight data structure and used as a fast filter step in query evaluation. We validate the efficiency and applicability of our methods through experimentation on both synthetic and real data.
INDEX TERMS
Index Terms- Query processing, XML/XSL/RDF, mining methods and algorithms, document indexing.
CITATION
Nikos Mamoulis, Wang Lian, David Wai-lok Cheung, S.M. Yiu, "Indexing Useful Structural Patterns for XML Query Processing", IEEE Transactions on Knowledge & Data Engineering, vol. 17, no. , pp. 997-1009, July 2005, doi:10.1109/TKDE.2005.110
105 ms
(Ver )