The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - Dec. (2012 vol.24)
pp: 2113-2126
Dongwon Kim , Pohang University of Science and Technology (POSTECH)
Hyeonseung Im , Pohang University of Science and Technology (POSTECH)
Sungwoo Park , Pohang University of Science and Technology (POSTECH)
ABSTRACT
With the rapid increase in the amount of uncertain data available, probabilistic skyline computation on uncertain databases has become an important research topic. Previous work on probabilistic skyline computation, however, only identifies those objects whose skyline probabilities are higher than a given threshold, or is useful only for 2D data sets. In this paper, we develop a probabilistic skyline algorithm called PSkyline which computes exact skyline probabilities of all objects in a given uncertain data set. PSkyline aims to identify blocks of instances with skyline probability zero, and more importantly, to find incomparable groups of instances and dispense with unnecessary dominance tests altogether. To increase the chance of finding such blocks and groups of instances, PSkyline uses a new in-memory tree structure called Z-tree. We also develop an online probabilistic skyline algorithm called O-PSkyline for uncertain data streams and a top-k probabilistic skyline algorithm called K-PSkyline to find top-k objects with the highest skyline probabilities. Experimental results show that all the proposed algorithms scale well to large and high-dimensional uncertain databases.
INDEX TERMS
Probabilistic logic, Probability distribution, Mathematical model, Equations, Query processing, Upper bound, data stream, Skyline computation, skyline probability, uncertain database
CITATION
Dongwon Kim, Hyeonseung Im, Sungwoo Park, "Computing Exact Skyline Probabilities for Uncertain Databases", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 12, pp. 2113-2126, Dec. 2012, doi:10.1109/TKDE.2011.164
REFERENCES
[1] Y. Tao, R. Cheng, X. Xiao, W.K. Ngai, B. Kao, and S. Prabhakar, "Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 922-933, 2005.
[2] J. Chen and R. Cheng, "Efficient Evaluation of Imprecise Location-Dependent Queries," Proc. Int'l Conf. Data Eng. (ICDE), pp. 586-595, 2007.
[3] M.A. Soliman, I.F. Ilyas, and K.C.-C. Chang, "Top-k Query Processing in Uncertain Databases," Proc. Int'l Conf. Data Eng. (ICDE), pp. 896-905, 2007.
[4] M. Hua, J. Pei, W. Zhang, and X. Lin, "Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 673-686, 2008.
[5] X. Lian and L. Chen, "Probabilistic Ranked Queries in Uncertain Databases," Proc. Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT), pp. 511-522, 2008.
[6] K. Yi, F. Li, G. Kollios, and D. Srivastava, "Efficient Processing of Top-k Queries in Uncertain Databases," Proc. Int'l Conf. Data Eng. (ICDE), pp. 1406-1408, 2008.
[7] X. Lian and L. Chen, "Top-k Dominating Queries in Uncertain Databases," Proc. Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT), pp. 660-671, 2009.
[8] J. Pei, B. Jiang, X. Lin, and Y. Yuan, "Probabilistic Skylines on Uncertain Data," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 15-26, 2007.
[9] X. Lian and L. Chen, "Monochromatic and Bichromatic Reverse Skyline Search over Uncertain Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 213-226, 2008.
[10] W. Zhang, W. Lin, Y. Zhang, W. Wang, and J.X. Yu, "Probabilistic Skyline Operator over Sliding Windows," Proc. Int'l Conf. Data Eng. (ICDE), pp. 1060-1071, 2009.
[11] M. Atallah and Y. Qi, "Computing All Skyline Probabilities for Uncertain Data," Proc. ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), pp. 279-287, 2009.
[12] H. Su, E. Wang, and A. Chen, "Continuous Probabilistic Skyline Queries over Uncertain Data Streams," Proc. Int'l Conf. Database and Expert Systems Applications (DEXA), pp. 105-121, 2010.
[13] C.C. Aggarwal and P.S. Yu, "A Survey of Uncertain Data Algorithms and Applications," IEEE Trans. Knowledge and Data Eng., vol. 21, no. 5, pp. 609-623, May 2009.
[14] S. Zhang, N. Mamoulis, and D.W. Cheung, "Scalable Skyline Computation Using Object-Based Space Partitioning," Proc. SIGMOD Int'l Conf. Management of Data, pp. 483-494, 2009.
[15] A. Guttman, "R-Trees: A Dynamic Index Structure for Spatial Searching," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 47-57, 1984.
[16] K.C.K. Lee, B. Zheng, H. Li, and W.-C. Lee, "Approaching the Skyline in Z Order," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 279-290, 2007.
[17] J.A. Orenstein and T.H. Merrett, "A Class of Data Structures for Associative Searching," Proc. ACM SIGACT-SIGMOD Symp. Principles of Database Systems (PODS), pp. 181-190, 1984.
[18] S. Börzsönyi, D. Kossmann, and K. Stocker, "The Skyline Operator," Proc. Int'l Conf. Data Eng. (ICDE), pp. 421-430, 2001.
[19] D. Papadias, Y. Tao, G. Fu, and B. Seeger, "Progressive Skyline Computation in Database Systems," ACM Trans. Database Systems, vol. 30, no. 1, pp. 41-82, 2005.
[20] J. Chomicki, P. Godfrey, J. Gryz, and D. Liang, "Skyline with Presorting," Proc. Int'l Conf. Data Eng. (ICDE), pp. 717-719, 2003.
[21] P. Godfrey, R. Shipley, and J. Gryz, "Maximal Vector Computation in Large Data Sets," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 229-240, 2005.
[22] C. Böhm, F. Fiedler, A. Oswald, C. Plant, and B. Wackersreuther, "Probabilistic Skyline Queries," Proc. ACM Conf. Information and Knowledge Management, pp. 651-660, 2009.
[23] Y. Tao, K. Yi, C. Sheng, and P. Kalnis, "Quality and Efficiency in High Dimensional Nearest Neighbor Search," Proc. SIGMOD Int'l Conf. Management of Data, pp. 563-576, 2009.
[24] Y. Tao and D. Papadias, "Maintaining Sliding Window Skylines on Data Streams," IEEE Trans. Knowledge and Data Eng., vol. 18, no. 3, pp. 377-391, Mar. 2006.
17 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool