This Article 
 Bibliographic References 
 Add to: 
Exact and Approximate Algorithms for the Index Selection Problem in Physical Database Design
December 1995 (vol. 7 no. 6)
pp. 955-967

Abstract—The index selection problem (ISP) is an important optimization problem in the physical design of databases. The aim of this paper is to show that ISP, although NP-hard, can in practice be solved effectively through well-designed algorithms. We formulate ISP as a 0-1 integer linear program and describe an exact branch-and-bound algorithm based on the linear programming relaxation of the model. The performance of the algorithm is enhanced by means of procedures to reduce the size of the candidate index set. We also describe heuristic algorithms based on the solution of a suitably defined knapsack subproblem and on Lagrangian decomposition. Finally, computational results on several classes of test problems are given. We report the exact solution of large-scale ISP instances involving several hundred indexes and queries. We also evaluate one of the heuristic algorithms we propose on very large-scale instances involving several thousand indexes and queries and show that it consistently produces very tight approximate (and sometimes provably optimal) solutions. Finally, we discuss possible extensions and future directions of research.

[1] H.D. Anderson and P.B. Berra,“Minimum cost selection of secondary indexes for formatted files,” ACM Trans. Database Systems, vol. 2, pp. 68-90, 1977.
[2] E. Barcucci,R. Pinzani,, and R. Sprugnoli,“Optimal selection of secondary indexes,” IEEE Trans. Software Engineering, vol. 16, pp. 32-38, 1990.
[3] E. Barcucci,A. Chiuderi,R. Pinzani,, and E. Rodella,“Optimal selection of secondary indices in relational databases,” Proc. Int’l Symp. Computer and Information Sciences I, M. Baray and B.Özgüç, eds., pp. 157-168. Elsevier, 1991.
[4] R. Bonanno,D. Maio,, and P. Tiberio,“An approximation algorithm for secondary index selection in relationaldatabase physical design,” The Computer J., vol. 28, pp. 398-405, 1985.
[5] F. Bonfatti,D. Maio,, and P. Tiberio,“A separability-based method for secondary index selection in physicaldatabase design,” Methodology and Tools for Data Base Design, S. Ceri, ed., pp. 149-160. NorthHolland, 1983.
[6] A.F. Cardenas, "Analysis and Performance of Inverted Data Base Structures," Comm. ACM, vol. 18, no. 5, pp. 253-263, May 1975.
[7] A. Caprara,“Un algoritmo esatto per la selezione di indici secondari nel progettofisico relazionale,” Tesi di Laurea, DEIS, Univ. of Bologna, 1991.
[8] P. Ciaccia and D. Maio,“On the optimal ordering of multiple-field tables,” Technical Report CIOC-CNR 86, Univ. of Bologna, 1992.
[9] D. Comer,“The difficulty of optimum index selection,” ACM Trans. Database Systems, vol. 3, pp. 440-445, 1978.
[10] H.P. Crowder,E.L. Johnson,, and M.W. Padberg,“Solving large-scale zero-one linear programming problems,” Operations Research, vol. 31, pp. 803-834, 1983.
[11] P. De,J.S. Park,, and H. Pirkul,“An integrated model ofrecord segmentation and access path selection fordatabases,” Information Systems, vol. 13, pp. 13-30, 1988.
[12] B.J. Falkowski,“Comments on an optimal set of indices for a relational database,” IEEE Trans. Software Engineering, vol. 18, pp. 168-171, 1992.
[13] S. Finkelstein, M. Schkolnick, and P. Tiberio, “Physical Database Design for Relational Databases,” ACM Trans. Database Systems, Mar. 1988.
[14] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness.New York: W.H. Freeman, 1979.
[15] M. Hatzopoulos and J.G. Kollias,“On the optimal selection of multilist database structures,” IEEE Trans. Software Engineering, vol. 10, pp. 681-687, 1984.
[16] M. Hatzopoulos and J.G. Kollias,“On the selection of a reduced set of indexes,” The Computer J., vol. 28, pp. 406-408, 1985.
[17] W.F. King,“On the selection of indices for a file,” IBM Research Report RJ 1641, San Jose, Calif., 1974.
[18] J.G. Kollias,“A heuristic approach for determining the optimal degree of fileinversion,” Information Systems, vol. 4, pp. 307-318, 1979.
[19] M.Y.L. Ip,L.V. Saxton,, and V.V. Raghavan,“On the selection of an optimal set of indexes,” IEEE Trans. Software Engineering, vol. 9, pp. 135-143, 1983.
[20] L.G. Khachian,“A polynomial algorithm in linear programming,” Soviet Mathematics Doklady, vol. 20, pp. 191-194, 1979.
[21] D. Maio,C. Sartori,, and M.R. Scalas,“Architecture of a physical design tool for relational DBMSs,” Computer Aided Data Base Design, A. Albano, V. De Antonellis, and A. Di Leva, eds., pp. 115-130. NorthHolland, 1985.
[22] D. Maio,C. Sartori,, and M.R. Scalas,“A modular user oriented decision support for physical databasedesign,” Decision Support Systems, vol. 3, pp. 155-163, 1987.
[23] R.E. Marsten, “The Design of the XMP Linear Programming Library,” ACM Trans. Mathematical Software, Vol. 7, No. 4, 1981, pp. 481-497.
[24] S. Martello and P. Toth, Knapsack Problems: Algorithms and Computer Implementation. John Wiley and Sons, 1990.
[25] G.L. Nemhauser and L.A. Wolsey, Integer and Combinatorial Optimization. New York: Wiley, 1988.
[26] M.W. Padberg and G. Rinaldi,“Optimization of a 532-city travelling salesman problem,” Operations Research Letters, vol. 6, pp. 1-8, 1987.
[27] A. Putkonen,“On the selection of access path in inverted database organization,” Information Systems, vol. 4, pp. 219-225, 1979.
[28] M. Schkolnick,“The optimal selection of secondary indices for files,” Information Systems, vol. 1, pp. 141-146, 1975.
[29] M. Stonebraker,“The choice of partial inversions and combined indices,” Int’l J. Computer and Information Sciences, vol. 3, pp. 167-188, 1974.
[30] K.-Y. Whang,G. Wiederhold,, and D. Sagalowicz,“Separability—An approach to physical database design,” IEEE Trans. Computers, vol. 33, pp. 209-222, 1984.
[31] P.S. Yu,M.-S. Chen,H. Heiss,, and S.H. Lee,“On workload characterization of relational database environments,” IEEE Trans on Software Engineering, vol. 18, no. 4, pp. 347-355, Apr. 1992.

Index Terms:
Index TermsIndex selection problem, relational database, physical database design, 0-1 integer linear programming, branch-and-bound algorithm, heuristic algorithm.
Alberto Caprara, Matteo Fischetti, Dario Maio, "Exact and Approximate Algorithms for the Index Selection Problem in Physical Database Design," IEEE Transactions on Knowledge and Data Engineering, vol. 7, no. 6, pp. 955-967, Dec. 1995, doi:10.1109/69.476501
Usage of this product signifies your acceptance of the Terms of Use.