This Article 
 Bibliographic References 
 Add to: 
Estimating Block Selectivities for Physical Database Design
February 1992 (vol. 4 no. 1)
pp. 89-98

Access path deployment is a critical issue in physical database design. Access paths typically include a clustered index as the primary access path and a set of secondary indexes as auxiliary access paths. To deploy the right access paths requires an effective algorithm and accurate estimation of the parameters used by the algorithm. One parameter central to any index-selection algorithm is the block selectivity of a query. Existing methods for estimating block selectivities are limited by restrictive assumptions. Furthermore, most existing methods produce estimates useful for aiding the selection of secondary indexes only. Little research has been done in the area of estimating block selectivities for supporting the selection of the clustered index. The paper presents a set of methods that do not depend on any specific assumption, produce accurate estimates, and can be used to aid in selecting the clustered index as well as secondary indexes.

[1] H. D. Anderson and P. B. Berra, "Minimum cost selection of secondary indexes for formatted files,"ACM Trans. Database Syst., vol. 2, no. 1, pp. 68-90, Mar. 1977.
[2] R. Bonanno, D. Maio, and P. Tiberio, "Secondary index selection in relational database physical design,"Comput. J., vol. 28, no. 4, pp. 398-405, Aug. 1985.
[3] F. Bonfatti, D. Maio, and P. Tiberio, "A separability-based method for secondary index selection in physical database design," inMethodology and Tools for Data Base Design, S. Ceri, Ed. New York: Elsevier North-Holland, 1983, pp. 148-160.
[4] S. Bergamaschi and M. R. Scalas, "Choice of the optimal number of blocks for data access by an index,"Inform. Syst., vol. 11, no. 3, pp. 199-209, 1986.
[5] A. Cardenas, "Analysis and performance of inverted data-base structures,"Commun. ACM, vol. 18, no. 5, pp. 253-263, 1975.
[6] T. Cheung, "Estimating block accesses and number of records in file management,"Commun. ACM, vol. 25, no. 7, pp. 484-487, July 1982.
[7] S. Christodoulakis, "Implications of certain assumptions in database performance evaluation,"ACM Trans. Database Syst.vol. 9, no. 2, pp. 163-186, June 1984.
[8] S. Christodoulakis, "Estimating block selectivities,"Inform. Syst., vol. 9, no. 1, pp. 69-79, 1984.
[9] P. C. Chu, "Database access path selection: A two-step approach,"Inform. Syst., vol. 14, no. 5, pp. 385-392, 1989.
[10] D. Comer, "The difficulty of optimum index selection,"ACM Trans. Database Syst., vol. 3, no. 4, pp. 440-445, 1978.
[11] S. Finkelstein, M. Schkolnick, and P. Tiberio, "Physical database design for relational databases,"ACM Trans. Database Syst., vol. 13, no. 1, pp. 91-129, Mar. 1988.
[12] M. Hatzopoulous and J. Y. Kollias, "On the selection of a reduced set of indexes,"Comput. J., vol. 28, no. 4, pp. 406-408, Aug. 1985.
[13] L. A. Goodman and W. H. Kruskal, "Measures of association for cross-classification,"J. Amer. Statist. Soc., vol. 49, pp. 732-764, Dec. 1954.
[14] A. Ijbema and H. Blanken, "Estimating bucket accesses: A practical approach," inProc. Conf. Data Eng. (COMPDEC), Los Angeles, CA, Feb. 1986, pp. 30-37.
[15] M. Y. L. Ip, L. V. Saxton, and V. V. Raghavan, "On the selection of an optimal set of indexes,"IEEE Trans. Software Eng., vol. SE-9, pp. 135-143, Mar. 1983.
[16] J. G. Kollias, "A heuristic approach for determining the optimal degree of file inversion,"Inform. Syst., vol. 4, no. 1, pp. 307-318, 1979.
[17] W. Luk, "On estimating block accesses in database organizations,"Commun. ACM, vol. 26, no. 11, pp. 945-947, Nov. 1983.
[18] L. F. Mackert and G. M. Lohman, "Index scans using a finite LRU buffer: A validated I/O model,"ACM Trans. Database Syst., vol. 14, no. 3, pp. 401-424, Sept. 1989.
[19] D. Maio, M. R. Scalas, and P. Tiberio, "On estimating access costs in relational databases,"Inform. Processing Lett., vol. 19, no. 3, pp. 157-161, 1984.
[20] M. V. Mannino, P. C. Chu, and T. Sager, "Statistical profile estimation in database systems,"ACM Comput. Surveys, vol. 20, no. 3, pp. 191-221, Sept. 1988.
[21] H. T. Reynolds,Analysis of Nominal Data, Beverly Hills, CA: Sage, 1984.
[22] M. Schkolnick, "The optimal selection of secondary indices for files,"Inform. Syst., vol. 1, pp. 141-146, 1975.
[23] M. Stonebraker, "The choice of partial inversions and combined indices,"Int. J. Comput. Inform. Sci., vol. 3, no. 2, pp. 167-188, June 1974.
[24] B. Vander Zander, H. Taylor, and D. Bilton, "Estimating block accesses when attributes are correlated," inProc. 12th Int. Conf. Very Large Databases, Kyoto, Japan, Aug. 1986, pp. 119-127.
[25] K. Whang, G. Wiederhold, and D. Sagalowicz, "Estimating block accesses in database organizations: A closed noniterative formula,"Commun. ACM, vol. 26, no. 11, pp. 940-944, Nov. 1983.
[26] K. Y. Whang, G. Wiederhold, and D. Sagalowitz, "Separability--An approach to physical database design,"IEEE Trans. Comput., vol. C-33, pp. 209-222, Mar. 1984.
[27] S. B. Yao, "Approximating block accesses in database organizations,"Commun. ACM, vol. 20, pp. 260-261, Apr. 1977.
[28] J. Zahorjan, B. Bell, and K. Sevcik, "Estimating block transfers when record access probabilities are non-uniform,"Inform. Processing Lett., vol. 16, no. 5, pp. 249-252, 1983.

Index Terms:
physical database design; clustered index; primary access path; auxiliary access paths; index-selection algorithm; block selectivity; database management systems
P.-C. Chu, "Estimating Block Selectivities for Physical Database Design," IEEE Transactions on Knowledge and Data Engineering, vol. 4, no. 1, pp. 89-98, Feb. 1992, doi:10.1109/69.124900
Usage of this product signifies your acceptance of the Terms of Use.