Subscribe

Issue No.04 - April (2013 vol.25)

pp: 945-960

Xingjie Liu , Penn State University, University Park

De-Nian Yang , Academia Sinica, Taipei

Mao Ye , Penn State University, University Park

Wang-Chien Lee , Penn State University, University Park

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2012.33

ABSTRACT

The skyline query, aiming at identifying a set of skyline tuples that are not dominated by any other tuple, is particularly useful for multicriteria data analysis and decision making. For uncertain databases, a probabilistic skyline query, called P-Skyline, has been developed to return skyline tuples by specifying a probability threshold. However, the answer obtained via a P-Skyline query usually includes skyline tuples undesirably dominating each other when a small threshold is specified; or it may contain much fewer skyline tuples if a larger threshold is employed. To address this concern, we propose a new uncertain skyline query, called U-Skyline query, in this paper. Instead of setting a probabilistic threshold to qualify each skyline tuple independently, the U-Skyline query searches for a set of tuples that has the highest probability (aggregated from all possible scenarios) as the skyline answer. In order to answer U-Skyline queries efficiently, we propose a number of optimization techniques for query processing, including 1) computational simplification of U-Skyline probability, 2) pruning of unqualified candidate skylines and early termination of query processing, 3) reduction of the input data set, and 4) partition and conquest of the reduced data set. We perform a comprehensive performance evaluation on our algorithm and an alternative approach that formulates the U-Skyline processing problem by integer programming. Experimental results demonstrate that our algorithm is 10-100 times faster than using CPLEX, a parallel integer programming solver, to answer the U-Skyline query.

INDEX TERMS

Vehicles, Semantics, Linear programming, Query processing, Partitioning algorithms, Optimization, query processing, Skyline query, uncertain databases

CITATION

Xingjie Liu, De-Nian Yang, Mao Ye, Wang-Chien Lee, "U-Skyline: A New Skyline Query for Uncertain Databases",

*IEEE Transactions on Knowledge & Data Engineering*, vol.25, no. 4, pp. 945-960, April 2013, doi:10.1109/TKDE.2012.33REFERENCES

- [1] Chen Li's Data Set, http://www.ics.uci.edu/~chenliwebobjects /, 2012.
- [2] Generating Correlated Data, http://www.uvm.edu/~dhowell/StatPages/More_Stuff Gener_Correl_Numbers.html, 2012.
- [3] IBM ILOG CPLEX Optimizer, http://www.ibm.com/software/integration/ optimizationcplex-optimizer/, 2012.
- [4] S. Abiteboul, P. Kanellakis, and G. Grahne, "On the Representation and Querying of Sets of Possible Worlds,"
Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '87), pp. 34-48, 1987.- [5] C.C. Aggarwal, "On Unifying Privacy and Uncertain Data Models,"
Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE '08), pp. 386-395, 2008.- [6] C.C. Aggarwal and P.S. Yu, "A Survey of Uncertain Data Algorithms and Applications,"
IEEE Trans. Knowledge and Data Eng., vol. 21, no. 5, pp. 609-623, May 2009.- [7] P. Agrawal, O. Benjelloun, A.D. Sarma, C. Hayworth, S. Nabar, T. Sugihara, and J. Widom, "Trio: A System for Data, Uncertainty, and Lineage,"
Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB '06), pp. 1151-1154, 2006.- [8] M.J. Atallah and Y. Qi, "Computing All Skyline Probabilities for Uncertain Data,"
Proc. 28th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS '09), pp. 279-287, 2009.- [9] I. Bartolini, P. Ciaccia, and M. Patella, "SaLSa: Computing the Skyline without Scanning the Whole Sky,"
Proc. 15th ACM Int'l Conf. Information and Knowledge Management (CIKM '06), pp. 405-414, 2006.- [10] S. Borzsonyi, K. Stocker, and D. Kossmann, "The Skyline Operator,"
Proc. 17th Int'l Conf. Data Eng. (ICDE '01), pp. 421-430, 2001.- [11] J. Boulos, N. Dalvi, B. Mandhani, S. Mathur, C. Re, and D. Suciu, "MYSTIQ: A System for Finding More Answers by Using Probabilities,"
Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '05), pp. 891-893, 2005.- [12] R. Cheng, D.V. Kalashnikov, and S. Prabhakar, "Evaluating Probabilistic Queries Over Imprecise Data,"
Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '03), pp. 551-562, 2003.- [13] R. Cheng, S. Singh, and S. Prabhakar, "U-DBMS: A Database System for Managing Constantly-Evolving Data,"
Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), pp. 1271-1274, 2005.- [14] R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J.S. Vitter, "Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data,"
Proc. 13th Int'l Conf. Very Large Data Bases (VLDB '04), pp. 876-887, 2004.- [15] J. Chomicki, P. Godfrey, J. Gryz, and D. Liang, "Skyline with Presorting,"
Proc. 19th Int'l Conf. Data Eng. (ICDE '03), 2003.- [16] G. Cormode, F. Li, and K. Yi, "Semantics of Ranking Queries for Probabilistic Data and Expected Ranks,"
Proc. IEEE Int'l Conf. Data Eng. (ICDE '09), pp. 305-316, 2009.- [17] N. Dalvi and D. Suciu, "Efficient Query Evaluation on Probabilistic Databases,"
Proc. 13th Int'l Conf. Very Large Data Bases (VLDB '04), pp. 864-875, 2004.- [18] A. Dempster, N. Laird, and D. Rubin, "Maximum Likelihood from Incomplete Data Via the em Algorithm,"
J. Royal Statistical Soc. Series B (Methodological), vol. 39, pp. 1-38, 1977.- [19] T. Hofmann, "Probabilistic Latent Semantic Analysis,"
Proc. Uncertainty in Artificial Intelligence Conf. (UAI), pp. 289-296, 1999.- [20] M. Hua, J. Pei, W. Zhang, and X. Lin, "Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach,"
Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '08), pp. 673-686, 2008.- [21] J. Huang, L. Antova, C. Koch, and D. Olteanu, "MayBMS: A Probabilistic Database Management System,"
Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '05), pp. 1071-1074, 2009.- [22] C. Jin, K. Yi, L. Chen, J.X. Yu, and X. Lin, "Sliding-Window Top-k Queries on Uncertain Streams,"
Proc. VLDB Endowment, vol. 1, pp. 301-312, 2008.- [23] D. Kossmann, F. Ramsak, and S. Rost, "Shooting Stars in the Sky: An Online Algorithm for Skyline Queries,"
Proc. 28th Int'l Conf. Very Large Data Bases (VLDB '02), pp. 275-286, 2002.- [24] K.C.K. Lee, B. Zheng, H. Li, and W.-C. Lee, "Approaching the Skyline in Z Order,"
Proc. 33rd Int'l Conf. Very Large Data Bases (VLDB '07), pp. 279-290, 2007.- [25] X. Lian and L. Chen, "Monochromatic and Bichromatic Reverse Skyline Search Over Uncertain Databases,"
Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '08), pp. 213-226, 2008.- [26] X. Lin, Y. Yuan, Q. Zhang, and Y. Zhang, "Selecting Stars: The K Most Representative Skyline Operator,"
Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE '07), pp. 86-95, 2007.- [27] X. Liu, M. Ye, J. Xu, Y. Tian, and W. Lee, "K-Selection Query Over Uncertain Data,"
Proc. 15th Int'l Conf. Database Systems for Advanced (DASFAA), pp. 444-459, 2010.- [28] E. Michelakis, R. Krishnamurthy, P.J. Haas, and S. Vaithyanathan, "Uncertainty Management in Rule-Based Information Extraction Systems,"
Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '05), pp. 101-114, 2009.- [29] D. Papadias, Y. Tao, G. Fu, and B. Seeger, "Progressive Skyline Computation in Database Systems,"
ACM Trans. Database System, vol. 30, no. 1, pp. 41-82, 2005.- [30] J. Pei, B. Jiang, X. Lin, and Y. Yuan, "Probabilistic skylines on uncertain data,"
Proc. 33rd Int'l Conf. Very Large Data Bases (VLDB '07), pp. 15-26, 2007.- [31] S. Prithviraj and A. Deshpande, "Representing and Querying Correlated Tuples in Probabilistic Databases,"
Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE '07), pp. 596-605, 2007.- [32] C. Re, N. Dalvi, and D. Suciu, "Efficient Top-K Query Evaluation on Probabilistic Data,"
Proc. Int'l Conf. Data Eng. (ICDE '07), pp. 896-905, 2007.- [33] A.D. Sarma, O. Benjelloun, A. Halevy, and J. Widom, "Working Models for Uncertain Data,"
Proc. 22nd Int'l Conf. Data Eng. (ICDE '06), p. 7, 2006.- [34] M.A. Soliman, I.F. Ilyas, and K.C. Chang, "Top-k Query Processing in Uncertain Databases,"
Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE '07), pp. 896-905, 2007.- [35] K.-L. Tan, P.-K. Eng, and B.C. Ooi, "Efficient Progressive Skyline Computation,"
Proc. 27th Int'l Conf. Very Large Data Bases (VLDB '01), pp. 301-310, 2001.- [36] Y. Tao, R. Cheng, X. Xiao, W.K. Ngai, B. Kao, and S. Prabhakar, "Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions,"
Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), pp. 922-933, 2005.- [37] A. Vlachou, C. Doulkeridis, and Y. Kotidis, "Angle-Based Space Partitioning for Efficient Parallel Skyline Computation,"
Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '08), pp. 227-238, 2008.- [38] M. Ye, K. Lee, W. Lee, X. Liu, and M. Chen, "Querying Uncertain Minimum in Wireless Sensor Networks,"
IEEE Trans. Knowledge and Data Eng., vol. 24, no. 12, pp. 2274-2287, Dec. 2012.- [39] M. Ye, W. Lee, D. Lee, and X. Liu, "Distributed Processing of Probabilistic Top-K Queries in Wireless Sensor Networks,"
IEEE Trans. Knowledge and Data Eng., vol. 25, no. 1, pp. 76-91, Jan. 2013.- [40] H. Yong, J. ha Kim, and S. won Hwang, "Skyline Ranking for Uncertain Data with Maybe Confidence,"
Proc. IEEE 24th Int'l Conf. Data Eng. Workshop (ICDEW '08), pp. 572-579, 2008.- [41] W. Zhang, X. Lin, Y. Zhang, W. Wang, and J.X. Yu, "Probabilistic Skyline Operator Over Sliding Windows,"
Proc. IEEE Int'l Conf. Data Eng. (ICDE '09), pp. 1060-1071, 2009. |