This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Generalized Normal Forms for Probabilistic Relational Data
May/June 2002 (vol. 14 no. 3)
pp. 485-497

Abstract—In recent years, several approaches have been proposed for representing uncertain data in a database. These approaches have typically extended the relational model by incorporating probability measures to capture the uncertainty associated with data items. However, previous research has not directly addressed the issue of normalization for reducing data redundancy and data anomalies in probabilistic databases. In this paper, we examine this issue. To that end, we generalize the concept of functional dependency to stochastic dependency and use that to extend the scope of normal forms to probabilistic databases. Our approach is a consistent extension of the conventional normalization theory and reduces to the latter.

[1] D. Barbará, H. Garcia-Molina, and D. Porter, “The Management of Probabilistic Data,” IEEE Trans. Knowledge and Data Eng., vol. 4, pp. 487-501, 1992.
[2] C. Batini, M. Lenzerini, and S.B. Navathe, “A Comparative Analysis of Methodologies for Database Schema Integration,” ACM Computing Surveys, vol. 18, no. 2, pp. 323-364, Dec. 1986.
[3] J. Bischoff and T. Alexander, Data Warehouse: Practical Advice from the Experts. Prentice-Hall, 1997.
[4] R. Cavallo and M. Pittarelli,“The theory of probabilistic databases,” Proc. VLDB Conf. , pp. 71-81, 1987.
[5] C.-S. Chang and A.L.P. Chen, “Aggregate Functions over Probabilistic Data,” Information Sciences, vol. 88, nos. 1-4, pp. 15-45, Jan. 1996.
[6] A.L.P. Chen and F.S.C. Tseng, “Evaluating Aggregate Operations over Imprecise Data,” IEEE Trans. Knowledge and Data Eng., vol. 8, pp. 273-284, 1996.
[7] E. Codd,“Extending the database relational model to capture more meaning,” ACM Trans. Database Systems, vol. 4, no. 4, pp. 397-434, 1979.
[8] C.J. Date, “Referential Integrity,” Relational Database: Selected Writings, pp. 41-63. Reading, Mass.: Addison-Wesley, 1986.
[9] L.G. Demichiel, “Resolving Database Incompatibility: An Approach to Performing Relational Operations over Mismatched Domains,” IEEE Trans. Knowledge and Data Eng., vol. 4, pp. 485-493, 1989.
[10] D. Dey and S. Sarkar, “A Probabilistic Relational Model and Algebra,” ACM Trans. Database Systems, vol. 21, no. 3, pp. 339-369, Sept. 1996.
[11] D. Dey and S. Sarkar, “Modifications of Uncertain Data: A Bayesian Framework for Belief Revision,” Information Systems Research, vol. 11, no. 1, pp. 1-16, Mar. 2000.
[12] D. Dey and S. Sarkar, “Generalized Normal Forms for Probabilistic Relational Data,” technical report, Univ. of Washington, Nov. 2000.
[13] J. Grant, “Partial Values in Tabular Database Model,” Information Processing Letters, vol. 9, no. 2, pp. 97-99, Aug. 1979.
[14] W. Kim et al., "On Resolving Schematic Heterogeneity in MultiDatabase Systems," Distributed and Parallel Databases, vol. 3, no. 1, 1993.
[15] L. Lakshmanan, N. Leone, R. Ross, and V.S. Subrahmanian, “Probview: A Flexible Probabilistic Database System,” ACM Trans. Database Systems, vol. 22, no. 3, pp. 419-469, Sept. 1997.
[16] J.A. Larson, S.B. Navathe, and R. El‐Masri, “A Theory of Attribute Equivalence in Databases with Applications to Schema Integration,” IEEE Trans. Software Engineering, Vol. 15, No. 4, Apr. 1989, pp. 449–463.
[17] W.-S. Li and C. Clifton, “Semantic Integration in Heterogeneous Databases Using Neural Networks,” Proc. 20th Int'l Conf. Very Large Data Bases, pp. 1-12, Sept. 1994.
[18] D. Maier, The Theory of Relational Databases. Computer Science Press, 1983.
[19] A. Motro, “Accommodating Imprecision in Database Systems: Issues and Solutions,” Proc. ACM SIGMOD Record, vol. 19, no. 4, pp. 69–74, Dec. 1990.
[20] R. Neapolitan, Probabilistic Reasoning in Expert Systems: Theory and Algorithms, John Wiley&Sons, New York, 1990.
[21] J. Pearl, Probabilistic Reasoning in Intelligent Systems. San Mateo, Calif.: Morgan Kaufman, 1988.
[22] M. Pittarelli, "An Algebra for Probabilistic Databases," IEEE Trans. Knowledge and Data Engineering, vol. 6, pp. 293-303, 1994.
[23] K.V.S.V.N. Raju and A.K. Majumdar, “Fuzzy Functional Dependencies and Lossless Join Decomposition of Fuzzy Relational Database Systems,” ACM Trans. Database Systems, vol. 13, no. 2, pp. 129-166, 1988.
[24] R.T. Snodgrass, “The Temporal Query Language TQuel,” ACM Trans. Database Systems, vol. 12, no. 2, pp. 247–298, 1987.
[25] F.S.C. Tseng, A.L.P. Chen, and W.-P. Yang, “Answering Heterogeneous Database Queries with Degrees of Uncertainty,” Distributed and Parallel Databases, vol. 1, pp. 281-302, 1993.
[26] J. Ullman, Principles of Database and Knowledge-Base Systems, vol. 1. Computer Science Press, 1988.

Index Terms:
Probabilistic relational model, data uncertainty, belief network, functional dependency, stochastic dependency, probabilistic normal form
Citation:
D. Dey, S. Sarkar, "Generalized Normal Forms for Probabilistic Relational Data," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 3, pp. 485-497, May-June 2002, doi:10.1109/TKDE.2002.1000338
Usage of this product signifies your acceptance of the Terms of Use.