The Community for Technology Leaders
RSS Icon
Issue No.05 - May (2012 vol.24)
pp: 840-853
Apostolos N. Papadopoulos , Aristotle University, Thessaloniki
Maria Kontaki , Aristotle University, Thessaloniki
Top-k dominating queries use an intuitive scoring function which ranks multidimensional points with respect to their dominance power, i.e., the number of points that a point dominates. The k points with the best (e.g., highest) scores are returned to the user. Both top-k and skyline queries have been studied in a streaming environment, where changes to the data set are very frequent. In such an environment, continuous query processing techniques are required toward efficient monitoring of query results, since periodic query re-execution is computationally intensive, and therefore, prohibitive. This work contains the first study of continuous top-k dominating queries over data streams. In comparison to continuous top-k and skyline queries, continuous top-k dominating queries pose additional challenges. Three exact algorithms (BFA, EVA, ADA) are studied, and among them ADA, which is enhanced with additional optimization techniques, shows the best overall performance. In some cases, we are willing to trade accuracy for speed. Toward this direction, two approximate algorithms are proposed (AHBA and AMSA). AHBA offers probabilistic guarantees regarding the accuracy of the result based on the Hoeffding bound, whereas AMSA performs a more aggressive computation resulting in more efficient processing. Evaluation results, based on real-life and synthetic data sets, show the efficiency and scalability of our techniques.
Top-k dominating queries, data streams, continuous queries, algorithms, analysis, approximation.
Apostolos N. Papadopoulos, Maria Kontaki, "Continuous Top-k Dominating Queries", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 5, pp. 840-853, May 2012, doi:10.1109/TKDE.2011.43
[1] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, "Models and Issues in Data Stream Systems," Proc. ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), pp. 1-16, 2002.
[2] B. Babcock, M. Datar, and R. Motwani, "Sampling from a Moving Window over Streaming Data," Proc. Thirteenth Ann. ACM-SIAM Symp. Discrete Algorithms (SODA), pp. 633-634, 2002.
[3] B. Babcock and C. Olston, "Distributed Top-$K$ Monitoring," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), pp. 28-39, 2003.
[4] S. Borzsonyi, D. Kossmann, and K. Stocker, "The Skyline Operator," Proc. Int'l Conf. Data Eng. (ICDE), pp. 421-430, 2001.
[5] C.-Y. Chan, H.V. Jagadish, K.-L. Tan, A.K.H. Tung, and Z. Zhang, "Finding $k$ -Dominant Skylines in High Dimensional Space," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), pp. 503-514, 2006.
[6] J. Chomicki, P. Godfrey, J. Gryz, and D. Liang, "Skyline with Presorting," Proc. Int'l Conf. Data Eng. (ICDE), pp. 717-719, 2003.
[7] R. Fagin, "Combining Fuzzy Information from Multiple Systems" J. Computer and System Sciences, vol. 58, no. 1, pp. 83-99, 1999.
[8] R. Fagin, "Optimal Aggregation Algorithms for Middleware," Proc. ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), pp. 102-113, 2001.
[9] P. Godfrey, "Skyline Cardinality for Relational Processing: How Many Vectors Are Maximal?," Proc. Symp. Foundations of Information and Knowledge Systems (FoIKS), pp. 78-97, 2004.
[10] A. Guttman, "R-Trees: A Dynamic Index Structure for Spatial Searching," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), pp. 47-57, 1984.
[11] W. Hoeffding, "Probability Inequalities for Sums of Bounded Random Variables" J. Am. Statistical Assoc., vol. 58, no. 301, pp. 13-30, 1963.
[12] V. Hristidis, N. Koudas, and Y. Papakonstantinou, "PREFER: A System for the Efficient Execution of Multi-Parametric Ranked Queries," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), pp. 259-270, 2001.
[13] I.F. Ilyas, G. Beskales, and M.A. Soliman, "A Survey of Top-$k$ Query Processing Techniques in Relational Databases," ACM Computing Surveys, vol. 40, no. 4, pp. 1-58, 2008.
[14] W. Kiessling, "Foundations of Preferences in Database Systems," Proc. 28th Int'l Conf. Very Large Data Bases (VLDB), pp. 311-322, 2002.
[15] M. Kontaki, A.N. Papadopoulos, and Y. Manolopoulos, "Continuous Top-k Dominating Queries in Subspaces," Proc. Panhellenic Conf. Informatics (PCI), 2008.
[16] H.T. Kung, "On Finding the Maxima of a Set of Vectors" J. ACM, vol. 22, no. 4, pp. 469-476, 1975.
[17] X. Lian and L. Chen, "Top-$k$ Dominating Queries in Uncertain Databases" Proc. 12th Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT), 2009.
[18] X. Lin, Y. Yuan, Q. Zhang, and Y. Zhang, "Selecting Stars: The $k$ Most Representative Skyline Operator," Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE), pp. 86-95, 2007.
[19] Y. Lu, J. Zhao, L. Chen, B. Cui, and D. Yang, "Effective Skyline Cardinality Estimation on Data Streams," Proc. 19th Int'l Conf. Database and Expert Systems Applications (DEXA), pp. 241-254, 2008.
[20] N. Mamoulis, M.L. Yiu, K.H. Cheng, and D.W. Cheng, "Efficient Top-$k$ Aggregation of Ranked Inputs," ACM Trans. Database Systems, vol. 32, no. 3, 2007.
[21] M. Mitzenmacher and E. Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge Univ. Press, 2005.
[22] K. Mouratidis, D. Papadias, S. Bakiras, and Y. Tao, "A Threshold-Based Algorithm for Continuous Monitoring of k Nearest Neighbors," IEEE Trans. Knowledge and Data Eng., vol. 17, no. 11, pp. 1451-1464, Nov. 2005.
[23] K. Mouratidis, S. Bakiras, and D. Papadias, "Continuous Monitoring of Top-$k$ Queries over Sliding Windows," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), pp. 635-646, 2006.
[24] M.J. Osborne and A. Rubenstein, A Course in Game Theory. MIT Press, 1994.
[25] D. Papadias, Y. Tao, G. Fu, and B. Seeger, "Progressive Skyline Computation in Database Systems" ACM Trans. Database Systems, vol. 30, no. 1, pp. 41-82, 2005.
[26] A.N. Papadopoulos, A. Lyritsis, A. Nanopoulos, and Y. Manolopoulos, "Domination Mining and Querying," Proc. Int'l Conf. Data Warehousing and Knowledge Discovery (DaWaK), pp. 145-156, 2007.
[27] D. Skoutas, D. Sacharidis, A. Simitsis, V. Kantere, and T. Sellis, "Top-$k$ Dominant Web Services under Multi-Criteria Matching" Proc. 12th Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT), 2009.
[28] R.E. Steuer, Multiple Criteria Optimization: Theory, Computations, and Application. John Wiley & Sons, 1986.
[29] Y. Tao, L. Ding, X. Lin, and J. Pei, "Distance-Based Representative Skyline," Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2009.
[30] Y. Tao and D. Papadias, "Maintaining Sliding Window Skylines on Data Streams," IEEE Trans. Knowledge and Data Eng., vol. 18, no. 3, pp. 377-391, Mar. 2006.
[31] M. Theobald, G. Weikum, and R. Schenkel, "Top-$k$ Query Evaluation with Probabilistic Guarantees," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 648-659, 2004.
[32] L. U, K. Mouratidis, and N. Mamoulis, "Continuous Spatial Assignment of Moving Users," VLDB J., vol. 19, pp. 141-160, 2010.
[33] A. Vlachou, C. Doulkeridis, Y. Kotidis, and K. Nørvåg, "Reverse Top-k Queries," Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2010.
[34] J.S. Vitter, "Random Sampling with a Reservoir" ACM Trans. Math. Software, vol. 11, no. 1, pp. 37-57, 1985.
[35] X. Xiong, M.F. Mokbel, and W.G. Aref, "SEA-CNN: Scalable Processing of Continuous K-Nearest Neighbor Queries in Spatio-Temporal Databases," Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 643-654, 2005.
[36] M.L. Yiu and N. Mamoulis, "Efficient Processing of Top-$k$ Dominating Queries on Multi-Dimensional Data," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 483-494, 2007.
[37] M.L. Yiu and N. Mamoulis, "Multidimensional Top-$k$ Dominating Queries," VLDB J., vol. 18, pp. 695-718, 2009.
[38] L. Zhang, Z. Li, M. Yu, and G. Zhao, "New Sampling-Based Summary Structures for Sliding Windows over Data Streams," Proc. Int'l Conf. Intelligent Computing (ICIC), pp. 1242-1249, 2007.
31 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool