The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - April (2013 vol.25)
pp: 764-775
Reza Akbarinia , INRIA and LIRMM, Montpellier
Patrick Valduriez , INRIA and LIRMM, Montpellier
Guillaume Verger , INRIA and LIRMM, Montpellier
ABSTRACT
SUM queries are crucial for many applications that need to deal with uncertain data. In this paper, we are interested in the queries, called ALL_SUM, that return all possible sum values and their probabilities. In general, there is no efficient solution for the problem of evaluating ALL_SUM queries. But, for many practical applications, where aggregate values are small integers or real numbers with small precision, it is possible to develop efficient solutions. In this paper, based on a recursive approach, we propose a new solution for those applications. We implemented our solution and conducted an extensive experimental evaluation over synthetic and real-world data sets; the results show its effectiveness.
INDEX TERMS
Decision support systems, Complexity theory, Probabilistic logic, Indexes, Aggregates, Distribution functions, query processing, Database management, systems
CITATION
Reza Akbarinia, Patrick Valduriez, Guillaume Verger, "Efficient Evaluation of SUM Queries over Probabilistic Data", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 4, pp. 764-775, April 2013, doi:10.1109/TKDE.2012.62
REFERENCES
[1] S. Abiteboul, B. Kimelfeld, Y. Sagiv, and P. Senellart, "On the Expressiveness of Probabilistic XML Models," VLDB J.—Int'l J. Very Large Data Bases, vol. 18, no. 5, pp. 1041-1064, 2009.
[2] P. Agrawal, O. Benjelloun, A. Das Sarma, C. Hayworth, S. Nabar, T. Sugihara, and J. Widom, "Trio: A System for Data, Uncertainty, and Lineage," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB), 2006.
[3] P. Andritsos, A. Fuxman, and R.J. Miller, "Clean Answers over Dirty Databases," Proc. 22nd Int'l Conf. Data Eng. (ICDE), 2006.
[4] L. Antova, T. Jansen, C. Koch, and D. Olteanu, "Fast and Simple Relational Processing of Uncertain Data," Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE), 2008.
[5] M.J. Atallah and Y. Qi, "Computing all Skyline Probabilities for Uncertain Data," Proc. 28th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), 2009.
[6] O. Benjelloun, A.D. Sarma, A. Halevy, and J. Widom, "ULDBs: Databases with Uncertainty and Lineage," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB), 2006.
[7] D. Burdick, P. Deshpande, T.S. Jayram, R. Ramakrishnan, and S. Vaithyanathan, "OLAP over Uncertain and Imprecise Data," Proc. 31st Int'l Conf. Very Large Data Bases (VLDB), 2005.
[8] G. Cormode and M.N. Garofalakis, "Sketching Probabilistic Data Streams," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2007.
[9] G. Cormode, F. Li, and K. Yi, "Semantics of Ranking Queries for Probabilistic Data and Expected Ranks," Proc. 22nd Int'l Conf. Data Eng. (ICDE), 2009.
[10] N. Dalvi and D. Suciu, "Efficient Query Evaluation on Probabilistic Databases," VLDB J.—Int'l J. Very Large Data Bases, vol. 16, no. 4, pp. 523-544, 2007.
[11] C. Ré and D. Suciu, "The Trichotomy of HAVING Queries on a Probabilistic Database," VLDB J.—Int'l J. Very Large Data Bases, vol. 18, no. 5, pp. 1091-1116, 2009.
[12] A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein, and W. Hong, "Model-Driven Data Acquisition in Sensor Networks," Proc. 30th Int'l Conf. Very Large Data Bases (VLDB), 2004.
[13] D. Deutch and T. Milo, "TOP-K Projection Queries for Probabilistic Business Processes," Proc. 12th Int'l Conf. Database Theory (ICDT), 2009.
[14] A. Gal, M.V. Martinez, G.I. Simari, and V. Subrahmanian, "Aggregate Query Answering under Uncertain Schema Mappings," Proc. Int'l Conf. Data Eng. (ICDE), 2009.
[15] T. Ge, S.B. Zdonik, and S. Madden, "Top-k Queries on Uncertain Data: On Score Distribution and Typical Answers," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2009.
[16] T.J. Green and V. Tannen, "Models for Incomplete and Probabilistic Information," IEEE Data Eng. Bull., vol. 29, no. 1, pp. 17-24, Mar. 2006.
[17] R. Gupta and S. Sarawagi, "Creating Probabilistic Databases from Information Extraction Models," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB), 2006.
[18] M. Hua, J. Pei, W. Zhang, and X. Lin, "Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2008.
[19] T.S. Jayram, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu, "Avatar Information Extraction System," IEEE Data Eng. Bull., vol. 29, no. 1, pp. 40-48, Mar. 2006.
[20] T.S. Jayram, A. McGregor, S. Muthukrishnan, and E. Vee, "Estimating Statistical Aggregates on Probabilistic Data Streams," Proc. 26th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), 2007.
[21] C. Jin, K. Yi, L. Chen, J.X. Yu, and X. Lin, "SlidingWindow Topk Queries on Uncertain Streams," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2008.
[22] B. Kanagal and A. Deshpande, "Online Filtering, Smoothing and Probabilistic Modeling of Streaming Data," Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE), 2008.
[23] B. Kimelfeld, Y. Kosharovsky, and Y. Sagiv, "Query Evaluation over Probabilistic XML," VLDB J.—Int'l J. Very Large Data Bases, vol. 18, no. 5, pp. 1117-1140, 2009.
[24] A. Nierman and H.V. Jagadish, "ProTDB: Probabilistic Data in XML," Proc. 28th Int'l Conf. Very Large Data Bases (VLDB), 2002.
[25] J. Pei, B. Jiang, X. Lin, and Y. Yuan, "Probabilistic Skylines on Uncertain Data," Proc. 33rd Int'l Conf. Very Large Data Bases (VLDB), 2007.
[26] G. Rempala and J. Wesolowski, "Asymptotics of Products of Sums and U-statistics," Electronic Comm. in Probability, vol. 7, pp. 47-54, 2002.
[27] R.B. Ross, V.S. Subrahmanian, and J. Grant, "Aggregate Operators in Probabilistic Databases," J. ACM, vol. 52, no. 1, pp. 54-101, 2005.
[28] A.D. Sarma, O. Benjelloun, A. Halevy, and J. Widom, "Working Models for Uncertain Data," Proc. 22nd Int'l Conf. Data Eng. (ICDE), 2006.
[29] M.A. Soliman, I.F. Ilyas, and K.C.-C. Chang, "Top-k Query Processing in Uncertain Databases," Proc. 23rd Int'l Conf. Data Eng. (ICDE), 2007.
[30] G. Trajcevski, R. Tamassia, H. Ding, P. Scheuermann, and I.F. Cruz, "Continuous Probabilistic Nearest-Neighbor Queries for Uncertain Trajectories," Proc. 12th Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT), 2009.
[31] T. Tran, A. McGregor, Y. Diao, L. Peng, and A. Liu, "Conditioning and Aggregating Uncertain Data Streams: Going Beyond Expectations," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2010.
[32] B. Yang, H. Lu, and C.S. Jensen, "Probabilistic Threshold k nearest Neighbor Queries over Moving Objects in Symbolic Indoor Space," Proc. 13th Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT), 2010.
[33] M.L. Yiu, N. Mamoulis, X. Dai, Y. Tao, and M. Vaitis, "Efficient Evaluation of Probabilistic Advanced Spatial Queries on Existentially Uncertain Data," IEEE Trans. Knowledge Data Eng., vol. 21, no. 1, pp. 108-122, Jan. 2009.
[34] S.M. Yuen, Y. Tao, X. Xiao, J. Pei, and D. Zhang, "Superseding Nearest Neighbor Search on Uncertain Spatial Databases," IEEE Trans. Knowledge Data Eng., vol. 22, no. 7, pp. 1041-1055, July 2010.
85 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool