This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Selection of Views to Materialize in a Data Warehouse
January 2005 (vol. 17 no. 1)
pp. 24-43
A data warehouse stores materialized views of data from one or more sources, with the purpose of efficiently implementing decision-support or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The goal is to select an appropriate set of views that minimizes total query response time and the cost of maintaining the selected views, given a limited amount of resource, e.g., materialization time, storage space, etc. In this article, we have developed a theoretical framework for the general problem of selection of views in a data warehouse. We present polynomial-time heuristics for a selection of views to optimize total query response time under a disk-space constraint, for some important special cases of the general data warehouse scenario, viz.: 1) an AND view graph, where each query/view has a unique evaluation, e.g., when a multiple-query optimizer can be used to general a global evaluation plan for the queries, and 2) an OR view graph, in which any view can be computed from any one of its related views, e.g., data cubes. We present proofs showing that the algorithms are guaranteed to provide a solution that is fairly close to (within a constant factor ratio of) the optimal solution. We extend our heuristic to the general AND-OR view graphs. Finally, we address in detail the view-selection problem under the maintenance cost constraint and present provably competitive heuristics.

[1] S. Chaudhuri and K. Shim, “Including Groupby in Query Optimization,” Proc. Int'l Conf. Very Large Database Systems, 1994.
[2] A. Gupta, V. Harinarayan, and D. Quass, “Generalized Projections: A Powerful Approach to Aggregation,” Proc. Int'l Conf. Very Large Database Systems, 1995.
[3] W. Yan and P. Larson, “Eager Aggregation and Lazy Aggregation,” Proc. 21st Int'l Conf. Very Large Databases (VLDB), pp. 345-357, 1995.
[4] P. O'Neil and G. Graefe, “Multi-Table Joins through Bitmapped Join Indices,” SIGMOD Record, vol. 24, no. 3, pp. 8-11, 1995.
[5] J. Widom, “Research Problems in Data Warehousing,” Proc. Fourth Int'l Conf. Information and Knowledge Management, 1995.
[6] R. Kimball, The Data Warehouse Toolkit. John Wiley & Sons, Inc., 1996.
[7] N. Roussopoulos, “The Logical Access Path Schema of a Database,” IEEE Trans. Software Eng., vol. 8, pp. 563-573, Nov. 1982.
[8] T.K. Sellis, “Multiple Query Optimization,” ACM Trans. Database Systems, vol. 13, 1988.
[9] A. Cosar, E.-P. Lim, and J. Srivastava, “Multiple Query Optimization with Depth-First Branch-and-Bound and Dynamic Query Ordering,” Proc. Conf. Information and Knowledge Management (CIKM), 1993.
[10] V. Harinarayan, A. Rajaraman, and J. Ullman, “Implementing Data Cubes Efficiently,” Proc. ACM SIGMOD Int'l Conf. Mangement of Data, 1996.
[11] H. Gupta, V. Harinarayan, A. Rajaraman, and J. Ullman, “Index Selection in OLAP,” Proc. Int'l Conf. Data Eng., 1997.
[12] U. Feige, “A Threshold of ln n for Approximating Set Cover,” Proc. 28th Ann. ACM Symp. Theory of Computing (STOC), 1996.
[13] U.S. Chakravarthy and J. Minker, “Processing Multiple Queries in Database Systems,” Database Eng., vol. 5, pp. 38-44, Sept. 1982.
[14] H. Gupta, “Selection and Maintenance of Materialized Views in a Data Warehouses,” PhD thesis, Stanford Univ., Dept. Computer Science, 1999.
[15] H. Gupta, “Selection of Views to Materialize in a Data Warehouse,” Proc. Int'l Conf. Database Theory, 1997.
[16] N. Nilsson, Principles of Artificial Intelligence. Morgan Kaufmann Publishers, Inc., 1980.
[17] I. Mumick, D. Quass, and B. Mumick, “Maintenance of Data Cubes and Summary Tables in a Warehouse,” Proc. ACM SIGMOD Int'l Conf. Mangement of Data, 1997.
[18] H. Gupta and I. Mumick, “Selection of Views to Materialize under a Maintenance Cost Constraint,” Proc. Int'l Conf. Database Theory, 1999.
[19] H. Karloff and M. Mihail, “On the Complexity of the View-Selection Problem,” Proc. Symp. Principles of Database Systems (PODS), 1999.
[20] R. Chirkova, A. Halevy, and D. Suciu, “A Formal Perspective on the View Selection Problem,” Proc. Int'l Conf. Very Large Database Systems, 2001.
[21] R. Chirkova, “The View Selection Problem Has an Exponential Bound for Conjunctive Queries and Views,” Proc. ACM Symp. Principles of Database Systems, 2002.
[22] J. Yang, K. Karlapalem, and Q. Li, “Algorithms for Materialized View Design in Data Warehousing Environment,” Proc. Int'l Conf. Very Large Database Systems, 1997.
[23] E. Baralis, S. Paraboschi, and E. Teniente, “Materialized View Selection in a Multidimensional Database,” Proc. Int'l Conf. Very Large Database Systems, 1997.
[24] D. Theodoratos and T. Sellis, “Data Warehouse Configuration,” Proc. Very Large Data Base Conf., 1997.
[25] A. Shukla, P. Deshpande, and J. Naughton, “Materialized View Selection for Multidimensional Datasets,” Proc. Int'l Conf. Very Large Database Systems, 1998.
[26] C. Zhang and J. Yang, “Genetic Algorithm for Materialized View Selection in Data Warehouse Environments,” Proc. Int'l Conf. Data Warehousing and Knowledge Discovery (DaWaK), 1999.
[27] M. Lee and J. Hammer, “Speeding Up Materialized View Selection in Data Warehouses Using a Randomized Algorithm,” Int'l J. Cooperative Information Systems, vol. 10, no. 3, 2001.
[28] P. Kalnis, N. Mamoulis, and D. Papadias, “View Selection Using Randomized Search,” Data and Knowledge Eng., vol. 42, no. 1, 2002.
[29] S. Chaudhuri and V. Narasayya, “An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server,” Proc. Int'l Conf. Very Large Database Systems, 1997.
[30] L. Colby et al., “Redbrick Vista: Aggregate Computation and Management,” Proc. Int'l Conf. Data Eng. (ICDE), 1998.
[31] S. Agrawal, S. Chaudhuri, and V. Narasayya, “Automated Selection of Materialized Views and Indexes in SQL Databases,” Proc. Int'l Conf. Very Large Database Systems, 2000.
[32] S. Chaudhuri and V. Narasayya, “Microsoft Index Tuning Wizard for SQL Server 7.0,” Proc. ACM SIGMOD Int'l Conf. Mangement of Data, 1998.
[33] H. Gupta, V. Navda, S. Das, and V. Chowdhary, “Energy-Efficient Gathering of Correlated Data in Sensor Networks,” technical report, State Univ. of New York, Stony Brook, 2004.
[34] B. Tan, S. Das, and H. Gupta, “Energy-Efficient Caching in Sensor Networks,” technical report, State Univ. of New York, Stony Brook, 2004.

Index Terms:
Views, view selection, data warehouse, materialization.
Citation:
Himanshu Gupta, Inderpal Singh Mumick, "Selection of Views to Materialize in a Data Warehouse," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 1, pp. 24-43, Jan. 2005, doi:10.1109/TKDE.2005.16
Usage of this product signifies your acceptance of the Terms of Use.