This Article 
 Bibliographic References 
 Add to: 
Index Selection for Databases: A Hardness Study and a Principled Heuristic Solution
November 2004 (vol. 16 no. 11)
pp. 1313-1323
We study the index selection problem: Given a workload consisting of SQL statements on a database, and a user-specified storage constraint, recommend a set of indexes that have the maximum benefit for the given workload. We present a formal statement for this problem and show that it is computationally "hard” to solve or even approximate it. We develop a new algorithm for the problem which is based on treating the problem as a knapsack problem. The novelty of our approach lies in an LP (linear programming) based method that assigns benefits to individual indexes. For a slightly modified algorithm, that does more work, we prove that we can give instance specific guarantees about the quality of our solution. We conduct an extensive experimental evaluation of this new heuristic and compare it with previous solutions. Our results demonstrate that our solution is more scalable while achieving comparable quality.

[1] S. Agrawal, S. Chaudhuri, and V. Narasayya, Automated Selection of Materialized Views and Indexes for SQL Databases Proc. 26th Int'l Conf. Very Large Databases (VLDB00), pp. 496-505, 2000.
[2] APB-I, OLAP Benchmark, Release II, OLAP Council,http:/, Nov. 1998.
[3] M. Bellare, Interactive Proofs and Approximation: Reductions from Two Provers in One Round Proc. Second Israel Symp. Theory of Computing Systems, pp. 266-274, 1993.
[4] S. Chaudhuri and V. Narasayya, An Efficient, Cost-Driven Index Selection Tool for Microsoft SQL Server Proc. 23rd Int'l Conf. Very Large Databases (VLDB), pp. 146-155, 1997.
[5] S. Chaudhuri and V. Narasayya, AutoAdmin‘What-If’Index Analysis Utility Proc. ACM SIGMOD, 1998.
[6] S. Choenni, H. Blanken, and T. Chang, Index Selection in Relational Databases Proc. Fifth IEEE Int'l Conf. Computing and Information, 1993.
[7] D. Comer, The Difficulty of Optimum Index Selection ACM Trans. Database Systems, vol. 3, no. 4, pp. 440-445, Dec. 1978.
[8] L. Engebretsen, Lower Bounds for Non-Boolean Constraint Satisfaction Programs Technical Report TR00-042, Electronic Colloquium on Computational Complexity, June 2000.
[9] U. Feige and M.K. Goemans, Approximation the Value of Two Prover Proof Systems, with Applications to MAX 2SAT and MAX DICUT Proc. Third Israel Symp. Theory of Computing and Systems, pp. 182-189, 1995.
[10] U. Feige, G. Kortsarz, and D. Peleg, The Dense k-Subgraph Problem Algorithmica, vol. 29, pp. 410-421, 2001.
[11] S. Finkelstein, M. Schkolnick, and P. Tiberio, Physical Database Design for Relational Databases Proc. ACM Trans. Database Systems, Mar. 1988.
[12] R. Garfinkel and G. Nemhauser, Integer Programming. New York: John Wiley&Sons, pp. 214-241, 1972.
[13] M.X. Goemans and D.P. Williamson, 878-Approximation Algorithms for MAXCUT and MAX 2SAT Proc. 26th Ann. ACM Symp. Theory of Computing, pp. 422-431, 1994.
[14] H. Gupta, V. Harinarayan, A. Rajaramana, and J. Ullman, Index Selection for OLAP Proc. 13th Int'l Conf. Data Eng., 1997.
[15] J. Håstad, Some Optimal Inapproximability Results Proc. 29th Ann. ACM Symp. Theory of Computing, pp. 1-10, 1997.
[16] M. Hammer and A. Chan, Index Selection in a Self Adaptive Data Base Management System Proc. SIGMOD Conf., 1976.
[17] C. Heeren, H.V. Jagadish, and L. Pitt, Optimal Indexing Using Near-Minimal Space Proc. ACM Symp. Principles of Database Systems (PODS), pp. 244-251, 2003.
[18] V. Harinarayan, A. Rajaramana, and J. Ullman, Implementing Data Cubes Efficiently Proc. SIGMOD, 1996.
[19] D. Hochbaum, Approximation Algorithms for NP-Hard Problems. PWS Publishing Co., 1997.
[20] H. Karloff and M. Michail, On the Complexity of the View Selection Problem Proc. ACM Symp. Principles of Database Systems (PODS), 1999.
[21] G. Lohman, A. Skelley, G. Valentin, D. Zillio, and M. Zuliani, DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes Proc. 16th Int'l Conf. Data Eng., pp. 101-110, 2000.
[22] S. Rozen and D. Shasha, A Framework for Automating Physical Database Design Proc. Conf. Very Large Databases, 1991.
[23] M. Serna, L. Trevisan, and F. Xhafa, The (Parallel) Approximability of Non-Boolean Satisfiability Problems and Restricted Integer Programming Proc. 15th Ann. Symp. Theoretical Aspects of Computer Science, pp. 488-498, 1998.
[24] A. Shukla, P. Deshpande, and J. Naughton, Materialized View Selection for Multi-Cube Data Models Proc. Int'l Conf. Extending Database Technology, 2000.
[25] TPC Benchmark H, Decision Support,http:/, 2004.

Index Terms:
Index selection, approximation, hardness result, NP-hardness, knapsack, linear programming, scalability.
Surajit Chaudhuri, Mayur Datar, Vivek Narasayya, "Index Selection for Databases: A Hardness Study and a Principled Heuristic Solution," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 11, pp. 1313-1323, Nov. 2004, doi:10.1109/TKDE.2004.75
Usage of this product signifies your acceptance of the Terms of Use.