Domains and Active Domains: What This Distinction Implies for the Estimation of Projection Sizes in Relational Databases
Issue No. 04 - August (1995 vol. 7)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/69.404035
<p><it>Abstract</it>—Database optimizers require statistical information about data distributions in order to evaluate result sizes and access plan costs for processing user queries. In this context, we consider the problem of estimating the size of the projections of a database relation, when measures on attribute domain cardinalities are maintained in the system. Our main theoretical contribution is a new formal model (<it>AD</it>), valid under the hypotheses of attribute independence and uniform distribution of attribute values, derived considering the difference between time-invariant domain (the set of values that an attribute can assume) and time-dependent “active domain” (the set of values that are actually assumed, at a certain time). Early models developed under the same assumptions are shown to be formally incorrect. Since the <it>AD</it> model is computationally high-demanding, we also introduce an approximate, easy-to-compute model (<it>A</it><super>2</super><it>D</it>) that, unlike previous approximations, yields low errors on all the parameter space of the active domain cardinalities. Finally, we extend the <it>A</it><super>2</super><it>D</it> model to the case of nonuniform distributions and present experimental results confirming the good behavior of the model.</p>
Relational database, projection, error estimate, combinatorial models, statistical profile, query optimization.
D. Maio and P. Ciaccia, "Domains and Active Domains: What This Distinction Implies for the Estimation of Projection Sizes in Relational Databases," in IEEE Transactions on Knowledge & Data Engineering, vol. 7, no. , pp. 641-655, 1995.