The Community for Technology Leaders
RSS Icon
Issue No.05 - May (2011 vol.23)
pp: 727-741
Lei Zou , Peking University, Beijing
Given a record set D and a query score function F, a top-k query returns k records from D, whose values of function F on their attributes are the highest. In this paper, we investigate the intrinsic connection between top-k queries and dominant relationships between records, and based on which, we propose an efficient layer-based indexing structure, Pareto-Based Dominant Graph (DG), to improve the query efficiency. Specifically, DG is built offline to express the dominant relationship between records and top-k query is implemented as a graph traversal problem, i.e., Traveler algorithm. We prove theoretically that the size of search space (that is the number of retrieved records from the record set to answer top-k query) in our algorithm is directly related to the cardinality of skyline points in the record set (see Theorem 3). Considering I/O cost, we propose cluster-based storage schema to reduce I/O cost in Traveler algorithm. We also propose the cost estimation methods in this paper. Based on cost analysis, we propose an optimization technique, pseudorecord, to further improve the search efficiency. In order to handle the top-k query in the high-dimension record set, we also propose N-Way Traveler algorithm. In order to handle DG maintenance efficiently, we propose “Insertion” and “Deletion” algorithms for DG. Finally, extensive experiments demonstrate that our proposed methods have significant improvement over its counterparts, including both classical and state art of top-k algorithms.
Top-k query, database, algorithms.
Lei Zou, "Pareto-Based Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries", IEEE Transactions on Knowledge & Data Engineering, vol.23, no. 5, pp. 727-741, May 2011, doi:10.1109/TKDE.2010.240
[1] R. Fagin, A. Lotem, and M. Naor, "Optimal Aggregation Algorithms for Middleware," Proc. Symp. Principles of Database Systems (PODS), 2001.
[2] S. Nepal and M.V. Ramakrishna, "Query Processing Issues in Image (Multimedia) Databases," Proc. 15th Int'l Conf. Data Eng. (ICDE), 1999.
[3] U. Güntzer, W.-T. Balke, and W. Kießling, "Optimizing Multi-Feature Queries for Image Databases," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2000.
[4] Y.-C. Chang, L.D. Bergman, V. Castelli, C.-S. Li, M.-L. Lo, and J.R. Smith, "The Onion Technique: Indexing for Linear Optimization Queries," Proc. ACM SIGMOD, 2000.
[5] D. Xin, C. Chen, and J. Han, "Towards Robust Indexing for Ranked Queries," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2006.
[6] V. Hristidis, N. Koudas, and Y. Papakonstantinou, "Prefer: A System for the Efficient Execution of Multi-Parametric Ranked Queries," Proc. ACM SIGMOD, 2001.
[7] G. Das, D. Gunopulos, N. Koudas, and D. Tsirogiannis, "Answering Top-K Queries Using Views," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2006.
[8] S. Börzsönyi, D. Kossmann, and K. Stocker, "The Skyline Operator," Proc. 17th Int'l Conf. Data Eng. (ICDE), 2001.
[9] D. Papadias, Y. Tao, G. Fu, and B. Seeger, "An Optimal and Progressive Algorithm for Skyline Queries," Proc. ACM SIGMOD, 2003.
[10] K.-L. Tan, P.-K. Eng, and B.C. Ooi, "Efficient Progressive Skyline Computation," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2001.
[11] D. Kossmann, F. Ramsak, and S. Rost, "Shooting Stars in the Sky: An Online Algorithm for Skyline Queries," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2002.
[12] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction to Algorithms. MIT Press, 2001.
[13] D. Campbell and R. Nagahisa, "A Foundation for Pareto Aggregation," J. Economical Theory, vol. 64, pp. 277-285, 1994.
[14] M. Voorneveld, "Characterization of Pareto Dominance," Operations Research Letters, vol. 32, no. 3, pp. 7-11, 2003.
[15] C. Li, B.C. Ooi, A.K.H. Tung, and S. Wang, "DADA: A Data Cube for Dominant Relationship Analysis," Proc. ACM SIGMOD, 2006.
[16] Y. Tao, V. Hristidis, D. Papadias, and Y. Papakonstantinou, "Branch-and-Bound Processing of Ranked Queries," Information Systems, vol. 32, no. 3, pp. 424-445, 2007.
[17] R.J. Lipton, J.F. Naughton, and D.A. Schneider, "Practical Selectivity Estimation through Adaptive Sampling," Proc. ACM SIGMOD, 1990.
[18] R.J. Lipton and J.F. Naughton, "Query Size Estimation by Adaptive Sampling," Proc. Symp. Principles of Database Systems (PODS), 1990.
[19] S. Chaudhuri, N.N. Dalvi, and R. Kaushik, "Robust Cardinality and Cost Estimation for Skyline Operator," Proc. 22nd Int'l Conf. Data Eng. (ICDE), 2006.
[20] C.B. Barber, D.P. Dobkin, and H. Huhdanpaa, "The Quickhull Algorithm for Convex Hulls," ACM Trans. Math. Software, vol. 22, pp. 469-483, 1996.
[21] D. Xin, J. Han, H. Cheng, and X. Li, "Answering Top-k Queries with Multi-Dimensional Selections: The Ranking Cube Approach," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2006.
[22] S. Nepal and M. Ramakrishna, "Query Processing Issues in Image(Multimedia) Databases," Proc. 15th Int'l Conf. Data Eng. (ICDE), 1999.
[23] M. Li and Y. Liu, "Iso-Map: Energy-Efficient Contour Mapping in Wireless Sensor Networks," IEEE Trans. Knowledge and Data Eng., vol. 22, no. 5, pp. 699-710, May 2010.
[24] H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum, "IO-Top-k: Index-Access Optimized Top-k Query Processing," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2006.
[25] N. Mamoulis, K.H. Cheng, M.L. Yiu, and D.W. Cheung, "Efficient Aggregation of Ranked Inputs," Proc. 22nd Int'l Conf. Data Eng. (ICDE), 2006.
25 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool