This Article 
 Bibliographic References 
 Add to: 
Statistical Relational Databases: Normal Forms
March 1991 (vol. 3 no. 1)
pp. 55-64

Problems associated with defining normal forms of relational tables relevant to statistical processing are discussed. The concepts of derived identifier, class identifier, derived class-counts, count domains, compact domains, and uniform domains for statistical relational tables are introduced. The structures of the first and the second statistical-normal forms and the relational decompositions needed to achieve them are also discussed. It is shown that the statistical-normal form can be an important method to determine whether the usual statistical analysis techniques are valid. Some suggestions are presented for extending the structured query language (SQL) statements to achieve these operations on statistical relational tables. Some results linking Codd's normal forms with statistical normal forms are discussed. Relational statistical abnormalities, called outlyers, are also discussed.

[1] R. Agrawal, "Alpha: An Extension of Relational Algebra to Express a Class of Recursive Queries,"Proc. Third Int'l Conf. Data Eng., CS Press, Los Alamitos, Calif., Order No. FN762, 1987, pp. 580-590.
[2] W.W. Armstrong, "Dependency structures of data base relationship," inProc. IFIP Congress, 1974, pp. 580-583.
[3] E. F. Codd, "A relational model of data for large shared data banks,"Commun. ACM, pp. 377-387, June 1970.
[4] T. (E.F.) Codd, "Further normalization of the data base relational model," inData Base Systems, Courant Computer Science Symposium Series Vol. 6. Englewood Cliffs, NJ: Prentice-Hall, pp. 33-64.
[5] D. Denning, "Secure statistical databases with random sample queries,"ACM TODS, vol. 5, pp. 291-315, 1980.
[6] R. Fagin, "Multivariate dependencies and a new normal form for relational database,"ACM TODS, vol. 2, no. 3, pp. 262-278, 1977.
[7] S.P. Ghosh, "An application of statistical databases in manufacturing testing," inProc. COMPDEC, vol. 1, 1984, pp. 96-103.
[8] S.P. Ghosh, "Statistical data reduction for manufacturing testing," inProc. IEEE Int. Conf. Data Eng., Los Angeles, CA, 1985, pp. 58-66.
[9] S. Ghosh, "SIAM: Statistics information access method,"Inform. Syst., 1988. Also published as IBM RJ4865, 1985.
[10] S. Ghosh,Data Base Organization for Data Management, 2nd ed., Academic Press, New York, 1986.
[11] S.P. Ghosh, "Statistical Relational Tables for Statistical Database Management,"IEEE Trans. on Software Eng., Vol. SE-12, No. 12, Dec. 1986, pp. 1,106- 1.116.
[12] S.P. Ghosh, "Numerical operations on relational database,"IEEE Trans. Software Eng., vol. 15, no. 5, pp. 600-610, 1989.
[13] S.P. Ghosh, "Statistical relational model," inProc. Fourth Int. Conf. Statist. Sci. DBMS, Rome, 1988, pp. 227-242.
[14] S.P. Ghosh, "A study on multivariate statistical data reduction," IBM Res. Rep. RJ 6311.
[15] G. Hebrail, "A model of summaries for very large databases," inProc. Third Int. Workshop Statist. Sci. Database Management, 1986, pp. 143-151.
[16] R. Hammond and J.L. McCarthy, Eds., inProc. Second Int. Workshop Statist. Database Management, Los Altos, CA, Sept. 27-29, 1983.
[17] M.G. Kendall and A. Stuart,The Advanced Theory of Statistics Vol. 1. London, England: Charles Griffin, 1958, ch. 1-6.
[18] M.G. Kendall and A. Stuart,"The Advanced Theory of Statistics", Vol. 2. New York: Hafner, 1961, ch. 17-20.
[19] Z. Meral Ozsoyoglu and G. Ozsoyoglu, "An extension of relational algebra for summary tables," inProc. Second Int. Workshop Statist. Database Management, 1983, pp. 202-211.
[20] G. Ozsoyoglu, Z. Ozsoyoglu, and F. Mata, "A language and a physical organization technique for summary tables," inProc. ACM SIGMOD, 1985.
[21] Z. Meral Ozsoyoglu and G. Ozsoyoglu, "STBE--A database query language for manipulating summary data," inProc. IEEE COMPDEC, 1984.
[22] M.A. Palley, "Security of statistical databases: Compromise through attribute correlation modeling," inProc. IEEE Second DE Conf., 1986, pp. 67-74.
[23] J. Schlorer, "Disclosure from statistical databases: Qualitative aspects of trackers,"ACM TODS, vol. 5, pp. 467-492, 1980.
[24] A. Shoshani, "Statistical databases: Characteristics, problems, and some solutions," inProc. 8th Int. Conf. Very Large Data Bases, Mexico City, Mexico, 1982, pp. 208-222.
[25] SQL/Data System Application Programming, IBM Program Product, SH24-5018-0, 1983.
[26] M. Stonebraker, Ed.,Ingres Papers. Reading, MA: Addison-Wesley, 1986.
[27] S.Y.W. Su, "SAM*: A semantic association model for corporate and scientific-statistical databases,"J. Inform. Sci., vol. 29, pp. 151-199, 1983.
[28] H.K.T. Wong, Ed.,A LBL Perspective on Statistical Database Management. Berkeley, CA: Lawrence Berkeley Laboratory, University of California.
[29] H.K.T. Wong,Proc. First LBL Workshop Statist. Database Management, Menlo Park, CA, 1982.
[30] Proc. Third Int. Workshop Statist. Sci. Database Management, Luxembourg, July 22-24, 1986.

Index Terms:
statistical relational databases; normal forms; relational tables; derived identifier; class identifier; derived class-counts; count domains; compact domains; uniform domains; relational decompositions; statistical analysis; structured query language; SQL; statistical abnormalities; outlyers; query languages; relational databases
S.P. Ghosh, "Statistical Relational Databases: Normal Forms," IEEE Transactions on Knowledge and Data Engineering, vol. 3, no. 1, pp. 55-64, March 1991, doi:10.1109/69.75889
Usage of this product signifies your acceptance of the Terms of Use.