This Article 
 Bibliographic References 
 Add to: 
Answering Frequent Probabilistic Inference Queries in Databases
April 2011 (vol. 23 no. 4)
pp. 512-526
Shaoxu Song, Hong Kong University of Science and Technology, Hong Kong
Lei Chen, Hong Kong University of Science and Technology, Hong Kong
Jeffrey Xu Yu, The Chinese University of Hong Kong, Hong Kong
Existing solutions for probabilistic inference queries mainly focus on answering a single inference query, but seldom address the issues of efficiently returning results for a sequence of frequent queries, which is more popular and practical in many real applications. In this paper, we mainly study the computation caching and sharing among a sequence of inference queries in databases. The clique tree propagation (ctp) algorithm is first introduced in databases for probabilistic inference queries. We use the materialized views to cache the intermediate results of the previous inference queries, which might be shared with the following queries, and consequently reduce the time cost. Moreover, we take the query workload into account to identify the frequently queried variables. To optimize probabilistic inference queries with ctp, we cache these frequent query variables into the materialized views to maximize the reuse. Due to the existence of different query plans, we present heuristics to estimate costs and select the optimal query plan. Finally, we present the experimental evaluation in relational databases to illustrate the validity and superiority of our approaches in answering frequent probabilistic inference queries.

[1] A. Faradjian, J. Gehrke, and P. Bonnet, "GADT: A Probability Space ADT for Representing and Querying the Physical World," Proc. Int'l Conf. Data Eng., 2002.
[2] C. Böhm, A. Pryakhin, and M. Schubert, "The Gauss-Tree: Efficient Object Identification in Databases of Probabilistic Feature Vectors," Proc. Int'l Conf. Data Eng., 2006.
[3] M.F. Mokbel, C.-Y. Chow, and W.G. Aref, "The New Casper: Query Processing for Location Services without Compromising Privacy," Proc. Int'l Conf. Very Large Data Bases (VLDB '06), 2006.
[4] R. Cheng, D. Kalashnikov, and S. Prabhakar, "Querying Imprecise Data in Moving Object Environments," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 9, pp. 1112-1127, Sept. 2004.
[5] H.C. Bravo and R. Ramakrishnan, "Optimizing MPF Queries: Decision Support and Probabilistic Inference," Proc. ACM SIGMOD, pp. 701-712, 2007.
[6] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, Inc., 1988.
[7] F.V. Jensen, Introduction to Bayesian Networks. Springer-Verlag, 1996.
[8] S.K.M. Wong, C.J. Butz, and Y. Xiang, "A Method for Implementing a Probabilistic Model as a Relational Database," Proc. Conf. Uncertainty in Artificial Intelligence (UAI), pp. 556-564, 1995.
[9] S.K.M. Wong, D. Wu, and C.J. Butz, "Probabilistic Reasoning in Bayesian Networks: A Relational Database Approach," Proc. Conf. Artificial Intelligence (AI), pp. 583-590, 2003.
[10] N.L. Zhang and D. Poole, "Exploiting Causal Independence in Bayesian Network Inference," J. Artificial Intelligence Research, vol. 5, pp. 301-328, 1996.
[11] G. Shafer, Probabilistic Expert Systems. Soc. for Industrial and Applied Math., 1996.
[12] N.L. Zhang and L. Yan, "Independence of Causal Influence and Clique Tree Propagation," Int'l J. Approximate Reasoning, vol. 19, nos. 3/4, pp. 335-349, 1998.
[13] A.L. Madsen and F.V. Jensen, "Lazy Propagation: A Junction Tree Inference Algorithm Based on Lazy Evaluation," Artificial Intelligence, vol. 113, nos. 1/2, pp. 203-245, 1999.
[14] F.V. Jensen and F. Jensen, "Optimal Junction Trees," Proc. Conf. Uncertainty in Artificial Intelligence (UAI), pp. 360-366, 1994.
[15] S. Chaudhuri and K. Shim, "Including Group-By in Query Optimization," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 354-366, 1994.
[16] S. Chaudhuri and K. Shim, "Optimizing Queries with Aggregate Views," Proc. Int'l Conf. Extending Database Technology (EDBT), pp. 167-182, 1996.
[17] S.K.M. Wong, C.J. Butz, and D. Wu, "On the Implication Problem for Probabilistic Conditional Independency," IEEE Trans. Systems, Man, and Cybernetics, Part A, vol. 30, no. 6, pp. 785-805, Nov. 2000.
[18] S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum, "Probabilistic Ranking of Database Query Results," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 888-899, 2004.
[19] R. Agrawal, T. Imielinski, and A.N. Swami, "Mining Association Rules between Sets of Items in Large Databases," Proc. ACM SIGMOD, pp. 207-216, 1993.
[20] N. Friedman, L. Getoor, D. Koller, and A. Pfeffer, "Learning Probabilistic Relational Models," Proc. Int'l Joint Conf. Artificial Intelligence (IJCAI), pp. 1300-1309, 1999.
[21] L. Getoor, B. Taskar, and D. Koller, "Selectivity Estimation Using Probabilistic Models," Proc. ACM SIGMOD, pp. 461-472, 2001.
[22] N.L. Zhang, "Computational Properties of Two Exact Algorithms for Bayesian Networks," Applied Intelligence, vol. 9, no. 2, pp. 173-183, 1998.
[23] P.P. Shenoy and G. Shafer, "Axioms for Probability and Belief-Function Proagation," Proc. Ann. Conf. Uncertainty in Artificial Intelligence (UAI), pp. 169-198, 1988.
[24] H. Xu, "Computing Marginals for Arbitrary Subsets from Marginal Representation in Markov Trees," Artificial Intelligence, vol. 74, no. 1, pp. 177-189, 1995.
[25] C.J. Butz, H. Yao, and S. Hua, "A Join Tree Probability Propagation Architecture for Semantic Modeling," J. Intelligent Information Systems, vol. 33, pp. 145-178, 2008.
[26] A.L. Madsen, "Variations over the Message Computation Algorithm of Lazy Propagation," IEEE Trans. Systems, Man, and Cybernetics, Part B, vol. 36, no. 3, pp. 636-648, June 2006.
[27] R.D. Shachter, "Evaluating Influence Diagrams," Operations Research, vol. 34, no. 6, pp. 871-882, 1986.
[28] R.D. Shachter, B. D'Ambrosio, and B.D. Favero, "Symbolic Probabilistic Inference in Belief Networks," Proc. Nat'l Conf. Artificial Intelligence (AAAI), pp. 126-131, 1990.
[29] W.X. Wen, "From Relational Databases to Belief Networks," Proc. Conf. Uncertainty in Artificial Intelligence (UAI), pp. 406-413, 1991.
[30] F.M. Malvestuto, "A Unique Formal System for Binary Decompositions of Database Relations, Probability Distributions, and Graphs," Information Sciences, vol. 59, nos. 1/2, pp. 21-52, 1992.
[31] S.K.M. Wong, "An Extended Relational Data Model for Probabilistic Reasoning," J. Intelligent Information Systems, vol. 9, no. 2, pp. 181-202, 1997.
[32] S. Chaudhuri, R. Krishnamurthy, S. Potamianos, and K. Shim, "Optimizing Queries with Materialized Views," Proc. Int'l Conf. Data Eng. (ICDE), pp. 190-200, 1995.
[33] J. Goldstein and P.-Å. Larson, "Optimizing Queries Using Materialized Views: A Practical, Scalable Solution," Proc. ACM SIGMOD, pp. 331-342, 2001.
[34] R. Chirkova and C. Li, "Materializing Views with Minimal Size to Answer Queries," Proc. Symp. Principles of Database Systems (PODS), pp. 38-48, 2003.

Index Terms:
Probabilistic inference, variable elimination, clique tree propagation.
Shaoxu Song, Lei Chen, Jeffrey Xu Yu, "Answering Frequent Probabilistic Inference Queries in Databases," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 4, pp. 512-526, April 2011, doi:10.1109/TKDE.2010.146
Usage of this product signifies your acceptance of the Terms of Use.