The Community for Technology Leaders
RSS Icon
Issue No.02 - February (2011 vol.23)
pp: 204-217
Lijiang Chen , Peking University, Beijing
Bin Cui , Peking University, Beijing
Hua Lu , Aalborg University, Aalborg
The skyline of a multidimensional point set is a subset of interesting points that are not dominated by others. In this paper, we investigate constrained skyline queries in a large-scale unstructured distributed environment, where relevant data are distributed among geographically scattered sites. We first propose a partition algorithm that divides all data sites into incomparable groups such that the skyline computations in all groups can be parallelized without changing the final result. We then develop a novel algorithm framework called PaDSkyline for parallel skyline query processing among partitioned site groups. We also employ intragroup optimization and multifiltering technique to improve the skyline query processes within each group. In particular, multiple (local) skyline points are sent together with the query as filtering points, which help identify unqualified local skyline points early on a data site. In this way, the amount of data to be transmitted via network connections is reduced, and thus, the overall query response time is shortened further. Cost models and heuristics are proposed to guide the selection of a given number of filtering points from a superset. A cost-efficient model is developed to determine how many filtering points to use for a particular data site. The results of an extensive experimental study demonstrate that our proposals are effective and efficient.
Constrained skyline query, filtering point, distributed query processing.
Lijiang Chen, Bin Cui, Hua Lu, "Constrained Skyline Query Processing against Distributed Data Sites", IEEE Transactions on Knowledge & Data Engineering, vol.23, no. 2, pp. 204-217, February 2011, doi:10.1109/TKDE.2010.103
[1] S. Borzonyi, D. Kossmann, and K. Stocker, "The Skyline Operator," Proc. Int'l Conf. Data Eng. (ICDE), pp. 421-430, 2001.
[2] J. Chomicki, P. Godfrey, J. Gryz, and D. Liang, "Skyline with Presorting," Proc. Int'l Conf. Data Eng. (ICDE), pp. 717-816, 2003.
[3] B. Cui, H. Lu, Q. Xu, L. Chen, Y. Dai, and Y. Zhou, "Parallel Distributed Processing of Constrained Skyline Queries by Filtering," Proc. Int'l Conf. Data Eng. (ICDE), 2008.
[4] P. Godfrey, R. Shipley, and J. Gryz, "Maximal Vector Computation in Large Data Sets," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 229-240, 2005.
[5] D. Kossmann, F. Ramsak, and S. Rost, "Shooting Stars in the Sky: An Online Algorithm for Skyline Queries," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 275-286, 2002.
[6] D. Papadias, Y. Tao, G. Fu, and B. Seeger, "An Optimal and Progressive Algorithm for Skyline Queries," Proc. ACM SIGMOD, pp. 467-478, 2003.
[7] K.-L. Tan, P.-K. Eng, and B.C. Ooi, "Efficient Progressive Skyline Computation," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 301-310, 2001.
[8] G. Hjaltason and H. Samet, "Distance Browsing in Spatial Database," ACM Trans. Database Systems (TODS), vol. 24, no. 2, pp. 265-318, 1999.
[9] W.-T. Balke, U. Güntzer, and J.X. Zheng, "Efficient Distributed Skylining for Web Information Systems," Proc. Int'l Conf. Extending Database Technology (EDBT), pp. 256-273, 2004.
[10] P. Wu, C. Zhang, Y. Feng, B.Y. Zhao, D. Agrawal, and A.E. Abbadi, "Parallelizing Skyline Queries for Scalable Distribution," Proc. Int'l Conf. Extending Database Technology (EDBT), pp. 112-130, 2006.
[11] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, "A Scalable Content-Addressable Network," Proc. ACM SIGCOMM, pp. 161-172, 2001.
[12] S. Wang, B.C. Ooi, A.K.H. Tung, and L. Xu, "Efficient Skyline Query Processing on Peer-to-Peer Networks," Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 1126-1135, 2007.
[13] Z. Huang, C.S. Jensen, H. Lu, and B.C. Ooi, "Skyline Queries against Mobile Lightweight Devices in MANETs," Proc. Int'l Conf. Data Eng. (ICDE), p. 66, 2006.
[14] P.A. Bernstein, N. Goodman, E. Wong, C.L. Reeve, J. James, and B. Rothnie, "Query Processing in a System for Distributed Databases (SDD-1)," ACM Trans. Database Systems (TODS), vol. 6, no. 4, pp. 602-625, 1981.
[15] S. Chaudhuri, N.N. Dalvi, and R. Kaushik, "Robust Cardinality and Cost Estimation for Skyline Operator," Proc. Int'l Conf. Data Eng. (ICDE), p. 64, 2006.
[16] P. Godfrey, "Skyline Cardinality for Relational Processing," Proc. Int'l Symp. Foundations of Information and Knowledge Systems (FoIKS), pp. 78-97, 2004.
[17] X. Lin, Y. Yuan, Q. Zhang, and Y. Zhang, "Selecting Stars: The k Most Representative Skyline Operator," Proc. Int'l Conf. Data Eng. (ICDE), pp. 86-95, 2007.
[18] Y. Lu, J. Zhao, L. Chen, B. Cui, and D. Yang, "Effective Skyline Cardinality Estimation on Data Streams," Proc. Int'l Conf. Database and Expert Systems Applications (DEXA), pp. 241-254, 2008.
[19] Y. Tao, L. Ding, X. Lin, and J. Pei, "Distance-Based Representative Skyline," Proc. Int'l Conf. Data Eng. (ICDE), pp. 892-903, 2009.
[20] Z. Zhang, Y. Yang, R. Cai, D. Papadias, and A. Tung, "Kernel-Based Skyline Cardinality Estimaiton," Proc. ACM SIGMOD, pp. 509-522, 2009.
[21] L. Zhu, Y. Tao, and S. Zhou, "Distributed Skyline Retrieval with Low Bandwidth Consumption," IEEE Trans. Knowledge and Data Eng., vol. 21, no. 3, pp. 384-400, Mar. 2009.
12 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool