This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
An Algebra for Probabilistic Databases
April 1994 (vol. 6 no. 2)
pp. 293-303

An algebra is presented for a simple probabilistic data model that may be regarded as an extension of the standard relational model. The probabilistic algebra is developed in such a way that (restricted to /spl alpha/-acyclic database schemes) the relational algebra is a homomorphic image of it. Strictly probabilistic results are emphasized. Variations on the basic probabilistic data model are discussed. The algebra is used to explicate a commonly used statistical smoothing procedure and is shown to be potentially very useful for decision support with uncertain information.

[1] J. Aczel and Z. Daroczy,On Measures of Information and their Characterizations. New York: Academic, 1975.
[2] W. R. Ashby, "Constraint analysis of many-dimensional relations,"General Syst., Yearbook, vol. 9, pp. 99-105, 1964.
[3] D. Barbara, H. Garcia-Molina, and D. Porter, "The management of probabilistic data,"IEEE Trans. Knowledge Data Eng., vol. 4, pp. 387-502, 1992.
[4] Y. Bishop, S. Fienberg, and P. Holland,Discrete Multivariate Analysis. Cambridge, MA: MIT Press, 1975.
[5] D. Brown, "A note on approximations to discrete probability distributions,"Inform. Control, vol. 2, pp. 386-392, 1959.
[6] J. Bunkeret al., Eds.,The National Halothane Study. Washington, DC: National Institutes of Health, U.S. Government Printing Office, 1969.
[7] R. Cavallo and J. DeVoy, "Iterative and recursive algorithms for tree and partition search of the lattice of structure models,"Int. J. General Syst., vol. 20, pp. 275-301, 1992.
[8] R. Cavallo and G. Klir, "Reconstructability analysis of multi-dimensional relations: A theoretical basis for computer-aided determination of acceptable systems models,"Int. J. General Syst., vol. 5, pp. 143-171, 1979.
[9] R. Cavallo and M. Pittarelli, "The theory of probabilistic databases," inProc. 13th Conf. on Very Large Databases, 1987.
[10] P. Diaconis and S. Zabell, "Updating subjective probability,"J. Amer. Statist. Ass., vol. 77, pp. 822-830, 1982.
[11] D. Dubois and H. Prade,Fuzzy Sets and Systems: Theory and Applications. New York: Academic, 1980.
[12] R. Fagin, "Degrees of acyclicity for hypergraphs and relational database schemes,"J. Ass. Comput. Mach., vol. 30, pp. 514-550, 1983.
[13] R. Fagin and M. Vardi, "The theory of data dependencies--a survey," in M. Anshel and W. Gewirtz, Eds.,Mathematics of Information Processing. Providence, RI: American Mathematical Society, 1986.
[14] S. Fienberg and P. Holland, "Simultaneous estimation of multinomial cell probabilities,"J. Amer. Statist. Assoc., vol. 68, pp. 683-691, 1973.
[15] B. R. Frieden, "Dice, entropy, and likelihood,"Proc. IEEE, vol. 73, pp. 1764-1770, 1985.
[16] A. Hai and G. Klir, "An empirical investigation of reconstructability analysis: Probabilistic systems,"Int. J. Man-Machine Studies, vol. 22, pp. 163-192, 1985.
[17] M. Higashi, "A systems modelling methodology: probabilistic and possibilistic approaches," Ph.D. dissertation, State Univ. of New York, Binghamton, 1984.
[18] E. T. Jaynes, "On the rationale of maximum-entropy methods,"Proc. IEEE, vol. 70, pp. 939-952, 1982.
[19] E. T. Jaynes, "Prior information and ambiguity in inverse problems," in D. McLaughlin, Ed.,Inverse Problems, SIAM-AMS Proc., vol. 14, pp. 151-166, 1984.
[20] G. Klir,Architecture of Systems Problem Solving. New York: Plenum, 1985.
[21] G. Klir, "Reconstruction principle of inductive reasoning,"Revue Int. de Systemique, vol. 4, pp. 65-78, 1990.
[22] P. M. Lewis, "Approximating probability distributions to reduce storage requirements,"Inform. Control, vol. 2, pp. 214-225, 1959.
[23] R. Loui, "Decisions with indeterminate probabilities,"Theory and Decision, vol. 21, pp. 283-309, 1986.
[24] J. MacQueen and J. Marschak, "Partial knowledge, entropy, and estimation,"Proc. Nat. Acad. Sci. U.S., vol. 72, pp. 3819-3824, 1975.
[25] R. Madden and W. R. Ashby, "The identification of many-dimensional relations,"Int. J. Syst. Sci., vol. 3, pp. 343-356, 1972.
[26] D. Maier,The Theory of Relational Databases. Rockville, MD: Computer Science, 1983.
[27] D. Maier and J. Ullman, "Connections in acyclic hypergraphs," inProc. ACM Symp. Principles of Database Systems, pp. 34-39, 1982.
[28] M. Mariano, "Aspects of inconsistency in reconstructability analysis," Ph.D. dissertation, State University of New York, Binghamton, 1987.
[29] K. McConway, "Marginalization and linear opinion pools,"J. Amer. Statist. Ass., vol. 76, pp. 410-414, 1981.
[30] K. Nambiar, "Some analytic tools for the design of relational database systems," inProc. 6th Int. Conf. Very Large Data Bases, pp. 417-428, 1980.
[31] J. Pearl,Probabilistic Reasoning in Intelligent Systems. San Mateo, CA: Morgan Kaufmann, 1988.
[32] F. Piepel, "Calculating centroids in constrained mixture experiments,"Technometrics, vol. 25, pp. 279-283, 1983.
[33] M. Pittarelli, "Identification of discrete probability distributions from partial information," Ph.D. dissertation, State University of New York, Binghamton, 1988.
[34] M. Pittarelli, "Probabilistic databases for decision analysis,"Int. J. Intell. Syst., vol., pp. 209-236, 1990.
[35] M. Pittarelli, "A note on probability estimation using reconstructability analysis,"Int. J. General Syst., vol. 18, pp. 11-21, 1990.
[36] M. Pittarelli, "Decisions with probabilities over finite product spaces,"IEEE Trans. Syst. Man Cybern., vol. 21, pp. 1238-1242, 1991.
[37] T. Seidenfeld, "Entropy and uncertainty,"Phil. Sci., vol. 53, pp. 467-491, 1986.
[38] F. Stephanet al., "The sampling procedure of the 1940 population census,"J. Amer. Statist. Ass., vol. 35, pp. 615-630, 1940.
[39] Y. Tian, "Probabilistic databases over acyclic schemes," M.S. thesis, SUNY Inst. of Technol., Utica, NY, 1988.
[40] D. Titterington, "Common structure of smoothing techniques in statistics,"Int. Statist. Rev., vol. 53, pp. 141-170, 1985.

Index Terms:
algebra; relational algebra; probability; data structures; database management systems; decision support systems; Bayes methods; Markov processes; uncertainty handling; database theory; probabilistic databases; relational algebra; probabilistic data model; probabilistic algebra; /spl alpha/-acyclic database schemes; homomorphic image; statistical smoothing procedure; decision support; uncertain information; Bayes networks; Markov networks
Citation:
M. Pittarelli, "An Algebra for Probabilistic Databases," IEEE Transactions on Knowledge and Data Engineering, vol. 6, no. 2, pp. 293-303, April 1994, doi:10.1109/69.277772
Usage of this product signifies your acceptance of the Terms of Use.