Proceedings 18th International Conference on Data Engineering (2002)
San Jose, California
Feb. 26, 2002 to Mar. 1, 2002
Chung-Min Chen , Telcordia Technologies
Yibei Ling , Telcordia Technologies
Top-k queries arise naturally in many database applications that require searching for records whose attribute values are close to those specified in a query. In this paper, we study the problem of processing a top-k query by translating it into an approximate range query that can be efficiently processed by traditional relational DBMSs. We propose a sampling-based approach, along with various query mapping strategies, to determine a range query that yields high recall with low access cost.Our experiments on real-world datasets show that, given the same memory budgets, our sampling-based estimator outperforms a previous histogram-based method in terms of access cost, while achieving the same level of recall. Furthermore, unlike the histogram-based approach, our sampling-based query mapping scheme scales well for high-dimensional data and is easy to implement with low maintenance cost.
Top-K query, Sampling, Range query
Chung-Min Chen, Yibei Ling, "A Sampling-Based Estimator for Top-k Query", Proceedings 18th International Conference on Data Engineering, vol. 00, no. , pp. 0617, 2002, doi:10.1109/ICDE.2002.994779