This Article 
 Bibliographic References 
 Add to: 
Automating Statistics Management for Query Optimizers
January/February 2001 (vol. 13 no. 1)
pp. 7-20

Abstract—Statistics play a key role in influencing the quality of plans chosen by a database query optimizer. In this paper, we identify the statistics that are essential for an optimizer. We introduce novel techniques that help significantly reduce the set of statistics that need to be created without sacrificing the quality of query plans generated. We discuss how these techniques can be leveraged to automate statistics management in databases. We have implemented and experimentally evaluated our approach on Microsoft SQL Server 7.0.

[1] The AutoAdmin Project, Microsoft Research. / .
[2] S. Choenni, H. Blanken, and T. Chang, Index Selection in Relational Databases Proc. Fifth IEEE Int'l Conf. Computing and Information, 1993.
[3] M. Charikar, S. Chaudhuri, R. Motwani, and V.R. Narasayya, “Towards Estimation Error Guarantees for Distinct Values,” Proc. 19th Symp. Principles of Database Systems, pp. 268-279, 2000.
[4] S. Chaudhuri, R. Motwani, and V. Narasayya, Random Sampling for Histogram Construction: How Much Is Enough? Proc. SIGMOD, pp. 436-447, June 1998.
[5] S. Chaudhuri and V. Narasayya, “An Efficient, Cost-Driven Index Selection Tool for Microsoft SQL Server,” Proc. 23rd Very Large Data Bases Conf., 1997.
[6] S. Finkelstein, M. Schkolnick, and P. Tiberio, “Physical Database Design for Relational Databases,” ACM Trans. Database Systems, Mar. 1988.
[7] M. Frank, E. Omiecinski, and S. Navathe, “Adaptive and Automative Index Selection in RDBMS,” Proc. Int'l Conf. Extending Database Technology (EDBT), 1992.
[8] P.B. Gibbons, Y. Matias, and V. Poosala, “Fast Incremental Maintenance of Approximate Histograms,” Proc. Very Large Data Bases Conf., 1997.
[9] P.J. Haas, J.F. Naughton, S. Seshadri, and L. Stokes, “Sampling-Based Estimation of the Number of Distinct Values of an Attribute,” Proc. 21st Int'l Conf. Very Large Databases, pp. 311-322, 1995.
[10] Y. Ioannidis and V. Poosala, “Balancing Histogram Optimality and Practicality for Query Result Size Estimation,” Proc. ACM SIGMOD, 1995.
[11] W. Labio, D. Quass, and B. Adelberg, “Physical Database Design for Data Warehousing,” Proc. Int'l Conf. Data Eng., 1997.
[12] G.S. Manku, S. Rajagopalan, and B. Lindsay, “Approximate Medians and Other Quantiles in One Pass and with Limited Memory,” Proc. ACM SIGMOD, 1998.
[13] V. Poosala and Y. Ioannidis, “Selectivity Estimation without the Attribute Value Independence Assumption,” Proc. 23rd Int'l Conf. Very Large Data Bases (VLDB), 1997.
[14] V. Poosala, Y. Ioannidis, P. Haas, and E. Shekita, “Improved Histograms for Selectivity Estimation of Range Predicates,” Proc. ACM SIGMOD 1996, pp. 294-305, 1996.
[15] D.R. Slutz, “Massive Stochastic Testing of SQL,” Proc. 24th Int'l Conf. Very Large Data Bases (VLDB '98), A. Gupta, O. Shmueli, and J. Widom, eds., pp. 618-622, Aug. 1998.
[16] TPC, TPC Benchmark D. (Decision Support). Working Draft 6.0, Aug. 1993.
[17] “TPC-D Data Generation with Skew,” S. Chaudhuri and V. Narasayya, eds., Jan. 1999. Available via anonymous ftp tpcdskew.

Index Terms:
Statistics, workload, query optimizer, query plan.
Surajit Chaudhuri, Vivek Narasayya, "Automating Statistics Management for Query Optimizers," IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 1, pp. 7-20, Jan.-Feb. 2001, doi:10.1109/69.908978
Usage of this product signifies your acceptance of the Terms of Use.