The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - Dec. (2012 vol.24)
pp: 2156-2169
Kai Zheng , University of Queensland, Brisbane
Zi Huang , University of Queensland, Brisbane
Aoying Zhou , Institute of East China Normal University, Shanghai
Xiaofang Zhou , University of Queensland, Brisbane
ABSTRACT
With the rapidly increasing availability of uncertain data in many important applications such as location-based services, sensor monitoring, and biological information management systems, uncertainty-aware query processing has received a significant amount of research effort from the database community in recent years. In this paper, we investigate a new type of query in the context of uncertain databases, namely uncertain top-k influential sites query ({\rm UT}k{\rm IS} query for short), which can be applied in a wide range of application areas such as marketing analysis and mobile services. Since it is not so straightforward to precisely define the semantics of {\rm top}k query with uncertain data, in this paper we introduce a novel and more intuitive formulation of the query on the basis of expected rank semantics. To address the efficiency issue caused by possible worlds exploration, we propose effective pruning rules and a divide-and-conquer paradigm such that the number of candidates as well as the number of possible worlds to be considered can be significantly reduced. Finally, we conduct extensive experiments on real data sets to verify the effectiveness and efficiency of the new methods proposed in this paper.
INDEX TERMS
Recurrent neural networks, Databases, Pipeline processing, Semantics, Probabilistic logic, Nearest neighbor searches, Marine vehicles, top-k query, Uncertain data, reverse nearest neighbor query
CITATION
Kai Zheng, Zi Huang, Aoying Zhou, Xiaofang Zhou, "Discovering the Most Influential Sites over Uncertain Data: A Rank-Based Approach", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 12, pp. 2156-2169, Dec. 2012, doi:10.1109/TKDE.2011.121
REFERENCES
[1] J. Pei, B. Jiang, X. Lin, and Y. Yuan, "Probabilistic Skylines on Uncertain Data," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 15-26, 2007.
[2] J. Chen and R. Cheng, "Efficient Evaluation of Imprecise Location-Dependent Queries," Proc. Int'l Conf. Data Eng. (ICDE), pp. 586-595, 2007.
[3] R. Cheng, D. Kalashnikov, and S. Prabhakar, "Evaluatizng Probabilistic Queries over Imprecise Data," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), pp. 551-562, 2003.
[4] R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J. Vitter, "Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 876-887, 2004.
[5] Y. Tao, R. Cheng, X. Xiao, W. Ngai, B. Kao, and S. Prabhakar, "Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 922-933, 2005.
[6] R. Cheng, J. Chen, M. Mokbel, and C. Chow, "Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data," Proc. Int'l Conf. Data Eng. (ICDE), pp. 973-982, 2008.
[7] R. Cheng, L. Chen, J. Chen, and X. Xie, "Evaluating Probability Threshold K-Nearest-Neighbor Queries over Uncertain Data," Proc. Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT), pp. 672-683, 2009.
[8] H. Kriegel, P. Kunath, and M. Renz, "Probabilistic Nearest-Neighbor Query on Uncertain Objects," Proc. 12th Int'l Conf. Database Systems for Advanced Applications (DASFAA), 2007.
[9] X. Lian and L. Chen, "Efficient Processing of Probabilistic Reverse Nearest Neighbor Queries over Uncertain Data," VLDB J., vol. 18, no. 3, pp. 787-808, 2009.
[10] X. Lian and L. Chen, "Monochromatic and Bichromatic Reverse Skyline Search over Uncertain Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), pp. 213-226, 2008.
[11] H. Kriegel, P. Kunath, M. Pfeifle, and M. Renz, "Probabilistic Similarity Join on Uncertain Data," Proc. 11th Int'l Conf. Database Systems for Advanced Applications (DASFAA), p. 295, 2006.
[12] V. Ljosa and A. Singh, "Top-K Spatial Joins of Probabilistic Objects," Proc. Int'l Conf. Data Eng. (ICDE), pp. 566-575, 2008.
[13] T. Xia, D. Zhang, E. Kanoulas, and Y. Du, "On Computing Top-T Most Influential Spatial Sites," Proc. Int'l Conf. Very Large Data Bases (VLDB), p. 957, 2005.
[14] F. Korn and S. Muthukrishnan, "Influence Sets Based on Reverse Nearest Neighbor Queries," ACM SIGMOD Record, vol. 29, no. 2, pp. 201-212, 2000.
[15] J. Kang, M. Mokbel, S. Shekhar, T. Xia, and D. Zhang, "Continuous Evaluation of Monochromatic and Bichromatic Reverse Nearest Neighbors," Proc. Int'l Conf. Data Eng. (ICDE), 2007.
[16] Y. Tao, D. Papadias, and X. Lian, "Reverse KNN Search in Arbitrary Dimensionality," Proc. Int'l Conf. Very Large Data Bases (VLDB), p. 755, 2004.
[17] G. Cormode, F. Li, and K. Yi, "Semantics of Ranking Queries for Probabilistic Data and Expected Ranks," Proc. Int'l Conf. Data Eng. (ICDE), pp. 305-316, 2009.
[18] T. Ge, S. Zdonik, and S. Madden, "Top-K Queries on Uncertain Data: On Score Distribution and Typical Answers," Proc. 35th ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), pp. 375-388, 2009.
[19] C. Re, N. Dalvi, and D. Suciu, "Efficient Top-K Query Evaluation on Probabilistic Data," Proc. Int'l Conf. Data Eng. (ICDE), 2007.
[20] M. Soliman, I. Ilyas, and K. Chang, "Top-K Query Processing in Uncertain Databases," Proc. Int'l Conf. Data Eng. (ICDE), pp. 896-905, 2007.
[21] K. Yi, F. Li, D. Srivastava, and G. Kollios, "Efficient Processing of Top-K Queries in Uncertain Databases with X-Relations," IEEE Trans. Knowledge and Data Eng., vol. 20, no. 12, pp. 1669-1682, Dec. 2008.
[22] X. Zhang and J. Chomicki, "Semantics and Evaluation of Top-K Queries in Probabilistic Databases," Proc. Int'l Workshop Database Ranking (DBRank), 2008.
[23] Y. Zhang, X. Lin, G. Zhu, W. Zhang, and Q. Lin, "Efficient Rank Based Knn Query Processing over Uncertain Data," Proc. Int'l Conf. Data Eng. (ICDE), 2010.
[24] X. Lian and L. Chen, "Probabilistic Group Nearest Neighbor Queries in Uncertain Databases," IEEE Trans. Knowledge and Data Eng., vol. 20, no. 6, pp. 809-824, June 2008.
[25] M. Hua, J. Pei, W. Zhang, and X. Lin, "Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), pp. 673-686, 2008.
[26] J. Li, B. Saha, and A. Deshpande, "A Unified Approach to Ranking in Probabilistic Databases," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 769-780, 2009.
[27] C. Yang and K. Lin, "An Index Structure for Efficient Reverse Nearest Neighbor Queries," Proc. Int'l Conf. Data Eng. (ICDE), pp. 485-492, 2001.
[28] I. Stanoi, D. Agrawal, and A. Abbadi, "Reverse Nearest Neighbor Queries for Dynamic Databases," Proc. ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery, pp. 44-53, 2000.
[29] Y. Tao, M. Yiu, and N. Mamoulis, "Reverse Nearest Neighbor Search in Metric Spaces," IEEE Trans. Knowledge and Data Eng., vol. 18, no. 9, pp. 1239-1252, Sept. 2006.
[30] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to Algorithms. The MIT Press, 1990.
[31] W. Hoeffding, "Probability Inequalities for Sums of Bounded Random Variables," J. Am. Statistical Assoc., vol. 58, no. 301, pp. 13-30, 1963.
[32] Y. Theodoridis, "The R-Tree-Portal," http:/www.rtreeportal. org, 2003.
23 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool