The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2010 vol.22)
pp: 420-436
Xiang Lian , Hong Kong University of Science and Technology, Hong Kong
Lei Chen , Hong Kong University of Science and Technology, Hong Kong
ABSTRACT
Recently, many new applications, such as sensor data monitoring and mobile device tracking, raise up the issue of uncertain data management. Compared to "certain” data, the data in the uncertain database are not exact points, which, instead, often reside within a region. In this paper, we study the ranked queries over uncertain data. In fact, ranked queries have been studied extensively in traditional database literature due to their popularity in many applications, such as decision making, recommendation raising, and data mining tasks. Many proposals have been made in order to improve the efficiency in answering ranked queries. However, the existing approaches are all based on the assumption that the underlying data are exact (or certain). Due to the intrinsic differences between uncertain and certain data, these methods are designed only for ranked queries in certain databases and cannot be applied to uncertain case directly. Motivated by this, we propose novel solutions to speed up the probabilistic ranked query (PRank) with monotonic preference functions over the uncertain database. Specifically, we introduce two effective pruning methods, spatial and probabilistic pruning, to help reduce the PRank search space. A special case of PRank with linear preference functions is also studied. Then, we seamlessly integrate these pruning heuristics into the PRank query procedure. Furthermore, we propose and tackle the PRank query processing over the join of two distinct uncertain databases. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approaches in answering PRank queries, in terms of both wall clock time and the number of candidates to be refined.
INDEX TERMS
Probabilistic ranked query, probabilistic ranked query on join, PRank, J-PRank, uncertain database.
CITATION
Xiang Lian, Lei Chen, "Ranked Query Processing in Uncertain Databases", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 3, pp. 420-436, March 2010, doi:10.1109/TKDE.2009.112
REFERENCES
[1] L. Antova, C. Koch, and D. Olteanu, “$10^{{10}^{6}}$ Worlds and Beyond: Efficient Representation and Processing of Incomplete Information,” Proc. 23rd Int'l Conf. Data Eng. (ICDE), 2007.
[2] O. Benjelloun, A.D. Sarma, A.Y. Halevy, and J. Widom, “ULDBs: Databases with Uncertainty and Lineage,” Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB), 2006.
[3] C. Böhm, A. Pryakhin, and M. Schubert, “The Gauss-Tree: Efficient Object Identification in Databases of Probabilistic Feature Vectors,” Proc. 22nd Int'l Conf. Data Eng. (ICDE), 2006.
[4] J. Boulos, N.N. Dalvi, B. Mandhani, S. Mathur, C. Ré, and D. Suciu, “Mystiq: A System for Finding More Answers by Using Probabilities,” Proc. ACM SIGMOD, 2005.
[5] T. Brinkhoff, H.-P. Kriegel, and B. Seeger, “Efficient Processing of Spatial Joins Using R-Trees,” Proc. ACM SIGMOD, 1993.
[6] Y.-C. Chang, L.D. Bergman, V. Castelli, C.-S. Li, M.-L. Lo, and J.R. Smith, “The Onion Technique: Indexing for Linear Optimization Queries,” Proc. ACM SIGMOD, 2000.
[7] L. Chen, M.T. Özsu, and V. Oria, “Robust and Fast Similarity Search for Moving Object Trajectories,” Proc. ACM SIGMOD, 2005.
[8] R. Cheng, D. Kalashnikov, and S. Prabhakar, “Querying Imprecise Data in Moving Object Environments,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 9, pp. 1112- 1127, Sept. 2004.
[9] R. Cheng, D.V. Kalashnikov, and S. Prabhakar, “Evaluating Probabilistic Queries over Imprecise Data,” Proc. ACM SIGMOD, 2003.
[10] R. Cheng, S. Singh, and S. Prabhakar, “U-DBMS: A Database System for Managing Constantly-Evolving Data,” Proc. 31st Int'l Conf. Very Large Data Bases (VLDB), 2005.
[11] G. Das, D. Gunopulos, N. Koudas, and D. Tsirogiannis, “Answering Top-$k$ Queries Using Views,” Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB), 2006.
[12] R. Fagin, A. Lotem, and M. Naor, “Optimal Aggregation Algorithms for Middleware,” Proc. ACM SIGACT-SIGMOD Symp. Principles of Database Systems (PODS), 2001.
[13] A. Faradjian, J. Gehrke, and P. Bonnet, “GADT: A Probability Space ADT for Representing and Querying the Physical world,” Proc. 18th Int'l Conf. Data Eng. (ICDE), 2002.
[14] A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” Proc. ACM SIGMOD, 1984.
[15] V. Hristidis, N. Koudas, and Y. Papakonstantinou, “PREFER: A System for the Efficient Execution of Multi-Parametric Ranked Queries,” Proc. ACM SIGMOD, 2001.
[16] V. Hristidis and Y. Papakonstantinou, “Algorithms and Applications for Answering Ranked Queries Using Ranked Views,” The VLDB J., vol. 13, no. 1, pp. 49-70, 2004.
[17] M. Hua, J. Pei, W. Zhang, and X. Lin, “Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach,” Proc. ACM SIGMOD, 2008.
[18] R. Jampani, F. Xu, M. Wu, L.L. Perez, C. Jermaine, and P.J. Haas, “MCDB: A Monte Carlo Approach to Managing Uncertain Data,” Proc. ACM SIGMOD, 2008.
[19] H.-P. Kriegel, P. Kunath, M. Pfeifle, and M. Renz, “Probabilistic Similarity Join on Uncertain Data,” Proc. 11th Int'l Conf. Database Systems for Advanced Applications (DASFAA), 2006.
[20] H.-P. Kriegel, P. Kunath, and M. Renz, “Probabilistic Nearest-Neighbor Query on Uncertain Objects,” Proc. 12th Int'l Conf. Database Systems for Advanced Applications (DASFAA), 2007.
[21] M. Li and Y. Liu, “Underground Coal Mine Monitoring with Wireless Sensor Networks,” ACM Trans. Sensor Networks, vol. 5, no. 2, pp. 1-29, Mar. 2009.
[22] X. Lian and L. Chen, “Monochromatic and Bichromatic Reverse Skyline Search over Uncertain Databases,” Proc. ACM SIGMOD, 2008.
[23] X. Lian and L. Chen, “Probabilistic Ranked Queries in Uncertain Databases,” Proc. Int'l Conf. Advances in Database Technology (EDBT), 2008.
[24] V. Ljosa and A.K. Singh, “APLA: Indexing Arbitrary Probability Distributions,” Proc. 23rd Int'l Conf. Data Eng. (ICDE), 2007.
[25] J. Pei, B. Jiang, X. Lin, and Y. Yuan, “Probabilistic Skylines on Uncertain Data,” Proc. 33rd Int'l Conf. Very Large Data Bases (VLDB), 2007.
[26] C. Re, N. Dalvi, and D. Suciu, “Efficient Top-$k$ Query Evaluation on Probabilistic Data,” Proc. 23rd Int'l Conf. Data Eng. (ICDE), 2007.
[27] A.D. Sarma, O. Benjelloun, A.Y. Halevy, and J. Widom, “Working Models for Uncertain Data,” Proc. 22nd Int'l Conf. Data Eng. (ICDE), 2006.
[28] P. Sen and A. Deshpande, “Representing and Querying Correlated Tuples in Probabilistic Databases,” Proc. 23rd Int'l Conf. Data Eng. (ICDE), 2007.
[29] P. Sen, A. Deshpande, and L. Getoor, “Exploiting Shared Correlations in Probabilistic Databases,” Proc. 34th Int'l Conf. Very Large Data Bases (VLDB), 2008.
[30] S. Singh, C. Mayfield, R. Shah, S. Prabhakar, S.E. Hambrusch, J. Neville, and R. Cheng, “Database Support for Probabilistic Attributes and Tuples,” Proc. 24th Int'l Conf. Data Eng. (ICDE), 2008.
[31] M.A. Soliman, I.F. Ilyas, and K.C. Chang, “Top-$k$ Query Processing in Uncertain Databases,” Proc. 23rd Int'l Conf. Data Eng. (ICDE), 2007.
[32] Y. Tao, R. Cheng, X. Xiao, W.K. Ngai, B. Kao, and S. Prabhakar, “Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions,” Proc. 31st Int'l Conf. Very Large Data Bases (VLDB), 2005.
[33] Y. Tao, V. Hristidis, D. Papadias, and Y. Papakonstantinou, “Branch-and-Bound Processing of Ranked Queries,” Information Systems, vol. 32, no. 3, pp. 424-445, 2007.
[34] Y. Tao, D. Papadias, and X. Lian, “Reverse $k$ NN Search in Arbitrary Dimensionality,” Proc. 30th Int'l Conf. Very Large Data Bases (VLDB), 2004.
[35] Y. Tao, D. Papadias, X. Lian, and X. Xiao, “Multidimensional Reverse $k$ NN Search,” The VLDB J., vol. 16, no. 3, pp. 293-316, 2007.
[36] Y. Theodoridis and T. Sellis, “A Model for the Prediction of R-Tree Performance,” Proc. ACM SIGACT-SIGMOD Symp. Principles of Database Systems (PODS), 1996.
[37] D.Z. Wang, E. Michelakis, M.N. Garofalakis, and J.M. Hellerstein, “Bayesstore: Managing Large, Uncertain Data Repositories with Probabilistic Graphical Models,” Proc. 34th Int'l Conf. Very Large Data Bases (VLDB), 2008.
[38] D. Xin, C. Chen, and J. Han, “Towards Robust Indexing for Ranked Queries,” Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB), 2006.
[39] W. Xue, Q. Luo, L. Chen, and Y. Liu, “Contour Map Matching for Event Detection in Sensor Networks,” Proc. ACM SIGMOD, 2006.
[40] K. Yi, F. Li, D. Srivastava, and G. Kollios, “Efficient Processing of Top-$k$ Queries in Uncertain Databases,” IEEE Trans. Knowledge and Data Eng., vol. 20, no. 12, pp. 1669-1682, Dec. 2008.
[41] M.L. Yiu, X. Dai, N. Mamoulis, and M. Vaitis, “Top-$k$ Spatial Preference Queries,” Proc. 23rd Int'l Conf. Data Eng. (ICDE), 2007.
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool