|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
2009 50th Annual IEEE Symposium on Foundations of Computer Science
Space-Efficient Framework for Top-k String Retrieval Problems
Atlanta, Georgia
October 25-October 27
ISBN: 978-0-7695-3850-1
| ASCII Text | x | ||
| Wing-Kai Hon, Rahul Shah, Jeffrey Scott Vitter, "Space-Efficient Framework for Top-k String Retrieval Problems," Foundations of Computer Science, IEEE Annual Symposium on, pp. 713-722, 2009 50th Annual IEEE Symposium on Foundations of Computer Science, 2009. | |||
| BibTex | x | ||
| @article{ 10.1109/FOCS.2009.19, author = {Wing-Kai Hon and Rahul Shah and Jeffrey Scott Vitter}, title = {Space-Efficient Framework for Top-k String Retrieval Problems}, journal ={Foundations of Computer Science, IEEE Annual Symposium on}, volume = {0}, year = {2009}, issn = {0272-5428}, pages = {713-722}, doi = {http://doi.ieeecomputersociety.org/10.1109/FOCS.2009.19}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Foundations of Computer Science, IEEE Annual Symposium on TI - Space-Efficient Framework for Top-k String Retrieval Problems SN - 0272-5428 SP713 EP722 A1 - Wing-Kai Hon, A1 - Rahul Shah, A1 - Jeffrey Scott Vitter, PY - 2009 KW - document retrieval KW - text indexing KW - succinct data structures KW - top-$k$ queries VL - 0 JA - Foundations of Computer Science, IEEE Annual Symposium on ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/FOCS.2009.19
Given a set ${\cal D}=\{d_1, d_2,..., d_D\}$ of $D$strings of total length $n$, our task is to report the "most relevant"strings for a given query pattern $P$. This involves somewhat more advanced query functionality than the usual pattern matching, as some notion of "most relevant" is involved. In information retrieval literature, this task is best achieved by using inverted indexes. However, inverted indexes work only for some predefined set of patterns. In the pattern matching community, the most popular pattern-matching data structures are suffix trees and suffix arrays. However, a typical suffix tree search involves going through all the occurrences of the pattern over the entire string collection, which might be a lot more than the required relevant documents. The first formal framework to study such kind of retrieval problems was given by [Muthukrishnan, 2002]. He considered two metrics for relevance: frequency and proximity. He took a threshold-based approach on these metrics and gave data structures taking $O(n \logn)$ words of space. We study this problem in a slightly different framework of reporting the top $k$ most relevant documents (in sorted order) under similar and more general relevance metrics. Our framework gives linear space data structure with optimal query times for arbitrary score functions. As a corollary, it improves the space utilization for the problems in [Muthukrishnan, 2002] while maintaining optimal query performance. We also develop compressed variants of these data structures for several specific relevance metrics.
Index Terms:
document retrieval, text indexing, succinct data structures, top-$k$ queries
Citation:
Wing-Kai Hon, Rahul Shah, Jeffrey Scott Vitter, "Space-Efficient Framework for Top-k String Retrieval Problems," focs, pp.713-722, 2009 50th Annual IEEE Symposium on Foundations of Computer Science, 2009
Usage of this product signifies your acceptance of the Terms of Use.
