This Article 
 Bibliographic References 
 Add to: 
On the Data Model and Access Method of Summary Data Management
December 1989 (vol. 1 no. 4)
pp. 519-529

A data model and an access method for summary data management are presented. Summary data, represented as a trinary tuple (statistical function, category, summary), are metaknowledge summarized by a statistical function of a category of individual information typically stored in a conventional database. For instance, (average-income, female engineer with 10 years' experience and master's degree, $45000) is a summary datum. The computational complexity of the derivability problem has been found intractable in general, and the proposed summary data model, enforcing the disjointness constraint, alleviates the intractable problem without loss of information. In order to store, manage, and access summary data, a multidimensional access method called summary data (SD) tree is proposed. By preserving the category hierarchy, the SD tree provides for efficient operations, including summary data search, derivation, insertion, and deletion.

[1] J. Bentley, "Multidimensional Binary Search Trees Used for Associative Searching,"Comm. ACM, Vol. 18, No. 9, Sept. 1975, pp. 509- 517.
[2] J. Cheiney, P. Faudemay, and R. Michel, "An extension of access paths to improve joins and selections," inProc. Int. Data Eng., 1986.
[3] M. C. Chen, L. McNamee, and M. Melkanoff, "A model of summary data and its applications in statistical databases," inProc. 4th Int. Working Conf. Statistical Sci. Database Management, 1988.
[4] M. C. Chen, "Derivation and estimation of summary data," Ph.D. dissertation, Dep. Comput. Sci., Univ. California, Los Angeles, 1989.
[5] P. P. Chen, "The Entity-Relationship Model -- Towards a Unified View of Data,"ACM Trans. Database Syst., Vol. 1, No. 1, Mar. 1976, pp. 9-36.
[6] E. F. Codd, "A relational model of data for large shared data banks,"Commun. ACM, pp. 377-387, June 1970.
[7] E. Fortunato, M. Rafanelli, F. Ricci, and A. Sebastio, "An algebra for statistical data," inProc. 3rd Int. Workshop Statistical Databases, 1986.
[8] S.P. Ghosh, "Statistical Relational Tables for Statistical Database Management,"IEEE Trans. on Software Eng., Vol. SE-12, No. 12, Dec. 1986, pp. 1,106- 1.116.
[9] S. Ghosh, "Statistical metadata: Linear regression analysis," inFoundation of Data Organization, S. Ghosh, Y. Kambayashi, and K. Tanaka, Ed. New York: Plenum, 1987. Also published as IBM RJ4444, 1985.
[10] S. Ghosh, "SIAM: Statistics information access method,"Inform. Syst., 1988. Also published as IBM RJ4865, 1985.
[11] H. Guttman, "R-trees: A dynamic index structure for spatial searching," inProc. ACM/SIGMOD Conf., 1984, pp. 47-56.
[12] G. Hebrail, "A model of summaries for very large database," inProc. 3rd Int. Workshop Statistical Databases, 1986.
[13] K. Hinrichs and J. Nievergelt, "The grid file: A data structure designed to support proximity queries on spatial objects," inProc. Workshop Graphtheoretic Concepts in Comput. Sci., Osnabruck, June 1983.
[14] S. L. Horowitz and T. Pavlidis, "Picture segmentation by a Tree Traversal Algorithm,"J. Assoc. Comput. Machinery, vol. 23, pp. 368-388, 1976.
[15] H. Ikeda and Y. Kobayashi, "Additional facilities of a conventional DBMS to support interactive statistical analysis," inProc. 1st Int. Workshop Statistical Database, 1981.
[16] F. Malvestuto, "The derivation problem for summary data," inProc. SIGMOD, 1988.
[17] Y. Ohsawa and M. Sakauchi, "The BD-tree--A newN-dimensional data structure with highly efficient dynamic characteristics," inProc. IFIP, 1983.
[18] G. Ozsoyoglu, Z. Ozsoyoglu, and F. Mata, "A language and a physical organization technique for summary tables," inProc. ACM SIGMOD, 1985.
[19] F. P. Preparata and M. I. Shamos,Computational Geometry, an Introduction. New York: Springer-Verlag, 1985.
[20] J. T. Robinson, "The k-D-B-tree: A search structure for large multidimensional dynamic indexes," inProc. ACM SIGMOD, 1981, pp. 10-18.
[21] H. Samet, "Hierarchical representations of collection of small rectangles,"ACM Comput. Surveys, Dec. 1988.
[22] H. Sato, "Handling summary information in a database: Derivability," inProc. ACM SIGMOD, 1981.
[23] T. Sellis, N. Roussopoulos, and C. Faloutsos, "The R+-tree: A dynamic index for multi-dimensional objects," inProc. VLDB, 1987.
[24] J.M. Smith and D.C.P. Smith, "Database Abstractions: Aggregation and Generalization,"ACM Trans. Database Sys., June 1977, pp. 105-133.
[25] J. Srivastava and V. Lum, "A tree based statistics access method (TBSAM)," inProc. Int. Data Eng., 1988. Also published as IBM RJ5399, 1986.
[26] M. Stonebreaker, T. Sellis, and E. Hanson, "An analysis of rule indexing implementations in data base systems," inProc. 1st Int. Conf. Expert Database Syst., Apr. 1986.
[27] T. J. Teorey, D. Yang, and J. P. Fry, "A logical design methodology for relational databases using the extended entity-relationship model,"ACM Comput. Surveys, vol. 18, no. 2, pp. 197-222, June 1986.
[28] A. Yao, "On the complexity of maintaining partial sums,"SIAM J. Comput., May 1985.

Index Terms:
data model; access method; summary data management; trinary tuple; statistical function; category; summary; metaknowledge; average-income; female engineer; experience; computational complexity; derivability problem; disjointness constraint; multidimensional access method; SD tree; summary data search; insertion; deletion; data structures; database management systems
M.C. Chen, L.P. McNamee, "On the Data Model and Access Method of Summary Data Management," IEEE Transactions on Knowledge and Data Engineering, vol. 1, no. 4, pp. 519-529, Dec. 1989, doi:10.1109/69.43426
Usage of this product signifies your acceptance of the Terms of Use.