The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - February (2008 vol.20)
pp: 246-260
ABSTRACT
High-dimensional databases pose a challenge withrespect to efficient access. High-dimensional indexes do notwork because of the oft-cited ?curse of dimensionality?. However, users are usually interested in querying data over a relativelysmall subset of the entire attribute set at a time. A potential solution is to use lower dimensional indexes that accurately represent the user access patterns. Query response using physical database design developed based on a static snapshot of the query workload may significantly degrade if the query patterns change.To address these issues, we introduce a parameterizable technique to recommend indexes based on index types frequently used forhigh-dimensional data sets and to dynamically adjust indexesas the underlying query workload changes. We incorporate aquery pattern change detection mechanism to determine when the access patterns have changed enough to warrant change inthe physical database design. By adjusting analysis parameters,we trade off analysis speed against analysis resolution. We perform experiments with a number of data sets, query sets, and parameters to show the effect that varying these characteristics has on analysis results.
INDEX TERMS
Indexing methods, Active databases, Database Administration
CITATION
Michael Gibas, Guadalupe Canahuate, Hakan Ferhatosmanoglu, "Online Index Recommendations for High-Dimensional Databases Using Query Workloads", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 2, pp. 246-260, February 2008, doi:10.1109/TKDE.2007.190690
REFERENCES
[1] R. Bellman, Adaptive Control Processes: A Guided Tour. Princeton Univ. Press, 1961.
[2] R. Weber, H.-J. Schek, and S. Blott, “A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,” Proc. 24th Int'l Conf. Very Large Data Bases (VLDB '98), pp. 194-205, 1998.
[3] S. Ponce, P.M. Vila, and R. Hersch, “Indexing and Selection of Data Items in Huge Data Sets by Constructing and Accessing Tag Collections,” Proc. 19th IEEE Symp. Mass Storage Systems and 10th Goddard Conf. Mass Storage Systems and Technologies, 2002.
[4] A. Shoshani, L. Bernardo, H. Nordberg, D. Rotem, and A. Sim, “Multi-Dimensional Indexing and Query Coordination for Tertiary Storage Management,” Proc. 11th Int'l Conf. Scientific and Statistical Data (SSDBM '99), 1999.
[5] S. Berchtold, D. Keim, and H. Kriegel, “The X-Tree: An Index Structure for High-Dimensional Data,” Proc. 22nd Int'l Conf. Very Large Data Bases (VLDB '96), pp. 28-39, 1996.
[6] C.-W. Chung and G.-H. Cha, “The GC-Tree: A High-Dimensional Index Structure for Similarity Search in Image Databases,” IEEE Trans. Multimedia, vol. 4, no. 2, pp. 235-247, June 2002.
[7] A. Blum and P. Langley, “Selection of Relevant Features and Examples in Machine Learning,” Artificial Intelligence, 1997.
[8] R. Kohavi and G. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence, 1997.
[9] I. Guyon and A. Elissef, “An Introduction to Variable and Feature Selection,” J. Machine Learning Research, 2003.
[10] M. Ip, L. Saxton, and V. Raghavan, “On the Selection of an Optimal Set of Indexes,” IEEE Trans. Software Eng., 1983.
[11] K. Whang, “Index Selection in Relational Databases,” Proc. Second Int'l Conf. Foundations on Data Organization (FODO '85), 1985.
[12] E. Barucci, R. Pinzani, and R. Sprugnoli, “Optimal Selection of Secondary Indexes,” IEEE Trans. Software Eng., 1990.
[13] M. Frank, E. Omiecinski, and S. Navathe, “Adaptive and Automated Index Selection in RDBMS,” Proc. Third Int'l Conf. Extending Database Technology (EDBT '92), 1992.
[14] S. Choenni, H. Blanken, and T. Chang, “On the Selection of Secondary Indexes in Relational Databases,” Data and Knowledge Eng., 1993.
[15] A. Capara, M. Fischetti, and D. Maio, “Exact and Approximate Algorithms for the Index Selection Problem in Physical Database Design,” IEEE Trans. Knowledge and Data Eng., 1995.
[16] S. Chaudhuri and V. Narasayya, “AutoAdmin ‘What-If’ Index Analysis Utility,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '98), pp. 367-378, 1998.
[17] S. Chaudhuri and V.R. Narasayya, “An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server,” VLDB J., pp. 146-155, citeseer.ist.psu.educhaudhuri97efficient.html , 1997.
[18] S. Agrawal, S. Chaudhuri, L. Kollár, A.P. Marathe, V.R. Narasayya, and M. Syamala, “Database Tuning Advisor for Microsoft SQL Server 2005,” Proc. 30th Int'l Conf. Very Large Data Bases (VLDB '04), pp. 1110-1121, 2004.
[19] A. Dogac, A.Y. Erisik, and A. Ikinci, “An Automated Index Selection Tool for Oracle7: Maestro 7,” Technical Report LBNL/PUB-3161, Software Research and Development Center, Scientific and Technical Research Council of Turkey (TUBITAK), 1994.
[20] G. Valentin, M. Zuliani, D. Zilio, G. Lohman, and A. Skelley, “DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes,” Proc. 16th Int'l Conf. Data Eng. (ICDE '00), 2000.
[21] S. Kai-Uwe, E. Schallehn, and I. Geist, “Autonomous Query-Driven Index Tuning,” Proc. Fourth Int'l Database Eng. and Applications Symp. (IDEAS '04), 2004.
[22] R.L.D.C. Costa and S. Lifschitz, “Index Self-Tuning with Agent-Based Databases,” Proc. 28th Latin-Am. Conf. Informatics (CLEI '02), 2002.
[23] N. Bruno and S. Chaudhuri, “To Tune or Not to Tune? A Lightweight Physical Design Alerter,” Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB '06), pp. 499-510, 2006.
[24] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '00), W. Chen, J. Naughton, and P.A.Bernstein, eds., pp. 1-12, 2000.
[25] J. Pei, J. Han, and R. Mao, “CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets,” Proc. ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery '00, pp. 21-30, 2000.
[26] C. Faloutsos, T. Sellis, and N. Roussopoulos, “Analysis of Object-Oriented Spatial Access Methods,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '87), pp. 426-439, 1987.
[27] C. Bohm, “A Cost Model for Query Processing in High Dimensional Data Spaces,” ACM Trans. Database Systems, vol. 25, no. 2, pp. 129-178, 2000.
26 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool