Issue No. 06 - June (1991 vol. 17)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/32.87280
<p>An approach to estimating record selectivity rooted in the theory of fitting a hierarchy of models in discrete data analysis is presented. In contrast to parametric methods, this approach does not presuppose a distribution pattern to which the actual data conform; it searches for one that fits the actual data. This approach makes use of parsimonious models wherever appropriate in order to minimize the storage requirement without sacrificing accuracy. Two-dimensional cases are used as examples to illustrate the proposed method. It is demonstrated that the technique of identifying a good-fitting and parsimonious model can drastically reduce storage space and that the implementation of this technique requires little extra processing effort. The case of perfect or near-perfect association and the idea of keeping information about salient cells of a table are discussed. A strategy to reduce storage requirement in cases in which a good-fitting and parsimonious model is not available is proposed. Hierarchical models for three-dimensional cases are presented, along with a description of the W.E. Deming and F.F. Stephan (1940) iterative proportional fitting algorithm which fits hierarchical models of any dimensions.</p>
contingency approach; record selectivities; discrete data analysis; parsimonious models; storage requirement; storage space; near-perfect association; storage requirement; three-dimensional cases; iterative proportional fitting algorithm; hierarchical models; information retrieval systems; relational databases; storage management
P. Chu, "A Contingency Approach to Estimating Record Selectivities," in IEEE Transactions on Software Engineering, vol. 17, no. , pp. 544-552, 1991.