This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Efficient and Progressive Algorithms for Distributed Skyline Queries over Uncertain Data
Aug. 2012 (vol. 24 no. 8)
pp. 1448-1462
Xiaofeng Ding, Huazhong University of Science and Technology, Wuhan
Hai Jin, Huazhong University of Science and Technology, Wuhan
The skyline operator has received considerable attention from the database community, due to its importance in many applications including multicriteria decision making, preference answering, and so forth. In many applications where uncertain data are inherently exist, i.e., data collected from different sources in distributed locations are usually with imprecise measurements, and thus exhibit kind of uncertainty. Taking into account the network delay and economic cost associated with sharing and communicating large amounts of distributed data over an internet, an important problem in this scenario is to retrieve the global skyline tuples from all the distributed local sites with minimum communication cost. Based on the well-known notation of the probabilistic skyline query over centralized uncertain data, in this paper, we propose the notation of distributed skyline queries over uncertain data. Furthermore, two communication- and computation-efficient algorithms are proposed to retrieve the qualified skylines from distributed local sites. Extensive experiments have been conducted to verify the efficiency, the effectiveness and the progressiveness of our algorithms with both the synthetic and real data sets.

[1] F. Li, K. Yi, and J. Jestes, "Ranking Distributed Probabilistic Data," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '09), June 2009.
[2] W. Zhang, X. Lin, Y. Zhang, W. Wang, and J. Yu, "Probabilistic Skyline Operator over Sliding Windows," Proc. IEEE 25th Int'l Conf. Data Eng. (ICDE '09), pp. 305-316, Mar. 2009.
[3] Y. Yuan, X. Lin, Q. Liu, W. Wang, J.X. Yu, and Q. Zhang, "Efficient Computation of the Skyline Cube," Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), pp. 241-252, 2005.
[4] X. Lin, Y. Yuan, Q. Zhang, and Y. Zhang, "Selecting Stars: The K Most Representative Skyline Operator," Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE '07), pp. 86-95, 2007.
[5] S. Borzsonyi, D. Kossmann, and K. Stocker, "The Skyline Operator," Proc. 17th Int'l Conf. Data Eng. (ICDE '01), pp.421-430, 2001.
[6] X. Lian and L. Chen, "Monochromatic and Bichromatic Reverse Skyline Search over Uncertain Database," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 213-226, 2008.
[7] X. Lian and L. Chen, "Probabilistic Ranked Queries in Uncertain Databases," Proc. Int'l Conf. Extending Database Technology (EDBT '08), pp. 511-522, 2008.
[8] M. Hua, J. Pei, W. Zhang, and X. Lin, "Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2008.
[9] R. Jampani, F. Xu, M. Wu, L.L. Perez, C.M. Jermaine, and P.J. Haas, "MCDB: A Monte Carlo Approach to Managing Uncertain Data," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2008.
[10] A. Guttman, "R-Trees: A Dynamic Index Structure for Spatial Searching," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 47-57, 1984.
[11] D. Papadias, Y. Tao, G. Fu, and B. Seeger, "An optimal and Progressive Algorithm for Skyline Queries," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 467-478, 2003.
[12] G. Cormode, M. Garofalakis, S. Muthukrishnan, and R. Rastogi, "Holistic Aggregates in a Networked World: Distributed Tracking of Approximate Quantiles," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2005.
[13] R. Huebsch, M. Garofalakis, J.M. Hellerstein, and I. Stoica, "Sharing Aggregate Computation for Distributed Queries," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2007.
[14] S. Wang, Q.H. Vu, B.C. Ooi, A.K.H. Tung, and L. Xu, "Skyframe: A Framework for Skyline Query Processing in Peer-to-Peer Systems," The VLDB J., vol. 18, pp. 345-362., 2009.
[15] S. Michel, P. Triantafillou, and G. Weikum, "KLEE: A Framework for Distributed Top-k Query Algorithms," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2005.
[16] I. Sharfman, A. Schuster, and D. Keren, "A Geometric Approach to Monitoring Threshold Functions over Distributed Data Streams," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2006.
[17] B. Babcock and C. Olston, "Distributed Top-k Monitoring," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2003.
[18] A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein, and W. Hong, "Model-Driven Data Acquisition in Sensor Networks," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2004.
[19] N. Dalvi and D. Suciu, "Efficient Query Evaluation On Probabilistic Databases," The VLDB J., vol. 16, no. 4, pp. 523-544, 2007.
[20] K. Deng, X. Zhou, and H.T. Shen, "Multi-Source Skyline Query Processing in Road Networks," Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE '07), 2007.
[21] A. Vlachou, C. Doulkeridis, and Y. Kotidis, "Angle-Based Space Partitioning for Efficient Parallel Skyline Computation," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2008.
[22] L. Zhu, Y. Tao, and S. Zhou, "Distributed Skyline Retrieval with Low Bandwidth Consumption," IEEE Trans. Knowledge and Data Eng., vol. 21, no. 3, pp. 384-400, Mar. 2009.
[23] W.-T. Balke, U. Guntzer, and J.X. Zheng, "Efficient Distributed Skylining for Web Information Systems," Proc. Ninth Int'l Conf. Extending Database Technology (EDBT '04), pp.256-273, 2004.
[24] A. Vlachou, C. Doulkeridis, Y. Kotidis, and M. Vazirgiannis, "Skypeer: Efficient Subspace Skyline Computation over Distributed Data," Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE '07), pp. 416-425, 2007.
[25] P. Wu, C. Zhang, and Y. Feng, "Parallelizing Skyline Queries for Scalable Distribution," Proc. Int'l Conf. Extending Database Technology (EDBT '05), pp.112-130, 2005.
[26] J. Pei, B. Jiang, X. Lin, and Y. Yuan, "Probabilistic Skylines on Uncertain Data," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2007.
[27] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Schenker, "A Scalable Content-Addressable Network," Proc. ACM SIGCOMM Int'l Conf. Data Comm., 2001.
[28] H.V. Jagadish, B.C. Ooi, and Q.H. Vu, "Baton: A Balanced Tree Structure for Peer-to-Peer Networks," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 661-672, 2005.
[29] A. Vlachou, C. Doulkeridis, K. Norvag, and M. Vazirgiannis, "On Efficient Top-k Query Processing in Highly Distributed Environments," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2008.
[30] W.-T. Balke, W. Nejdl, W. Siberski, and U. Thaden, "Progressive Distributed Top-k Retrieval in Peer-to-Peer Networks," Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 174-185, 2005.
[31] L. Chen, B. Cui, H. Lu, L. Xu, and Q. Xu, "iSky: Efficient and Progressive Skyline Computing in a Structured P2P Network," Proc. IEEE 28th Int'l Conf. Distributed Computing Systems (ICDCS), 2008.
[32] X. Ding and H. Jin, "Efficient and Progressive Algorithms for Distributed Skyline Queries over Uncertain Data," Proc. IEEE 30th Int'l Conf. Distributed Computing Systems (ICDCS), pp. 149-158, 2010.
[33] B. Cui, L. Chen, L. Xu, H. Lu, G. Song, and Q. Xu, "Efficient Skyline Computation in Structured Peer-to-Peer Systems," IEEE Trans. Knowledge and Data Eng., vol. 21, no. 7, pp. 1059-1072, July 2009.
[34] L. Chen, B. Cui, and H. Lu, "Constrained Skyline Query Processing against Distributed Data Sites," IEEE Transaction on Data and Knowledge Eng., vol. 23, no. 2, pp. 204-217, Feb. 2011.
[35] Z. Zhang, Y. Yang, R. Cai, D. Papadias, and A. Tung, "Kernel-Based Skyline Cardinality Estimation," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2009.

Index Terms:
Skyline, distributed database, uncertain data.
Citation:
Xiaofeng Ding, Hai Jin, "Efficient and Progressive Algorithms for Distributed Skyline Queries over Uncertain Data," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 8, pp. 1448-1462, Aug. 2012, doi:10.1109/TKDE.2011.77
Usage of this product signifies your acceptance of the Terms of Use.