This Article 
 Bibliographic References 
 Add to: 
Efficient Queries over Web Views
November/December 2002 (vol. 14 no. 6)
pp. 1280-1298

Abstract—Large Web sites are becoming repositories of structured information that can benefit from being viewed and queried as relational databases. However, querying these views efficiently requires new techniques. Data usually resides at a remote site and is organized as a set of related HTML documents, with network access being a primary cost factor in query evaluation. This cost can be reduced by exploiting the redundancy often found in site design. We use a simple data model, a subset of the Araneus data model, to describe the structure of a Web site. We augment the model with link and inclusion constraints that capture the redundancies in the site. We map relational views of a site to a navigational algebra and show how to use the constraints to rewrite algebraic expressions, reducing the number of network accesses. We show that similar techniques can be used to maintain materialized views over sets of HTML pages.

[1] S. Abiteboul, R. Hull, and V. Vianu, Foundations of Databases. Addison-Wesley, 1995.
[2] S. Abiteboul and V. Vianu, “Regular Path Queries with Constraints,” Proc. 16th ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems (PODS '97), May 1997
[3] G.O. Arocena and A.O. Mendelzon, “WebOQL: Restructuring Documents, Databases, and Webs,” Proc. 14th Int'l Conf. Data Eng. (ICDE '98), Feb. 1998.
[4] P. Atzeni, G. Mecca, and P. Merialdo, "To Weave the Web," Proc. 23th VLDB Conf., 1997, pp. 206-215.
[5] C. Baru, A. Gupta, B. Ludäscher, R. Marciano, Y. Papakonstantinou, and P. Velikhov, “XML) Based Information Mediation with MIX,” Proc. 1999 ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '99), June 1999.
[6] C. Beeri and Y. Kornatzky,“Algebraic optimization of object-oriented query languages,” Proc. Third Int’l Conf. Database Theory, pp. 72-88, Springer Verlag, 1990.
[7] C. Beeri and Y. Tzaban, “SAL: An Algebra for Semistructured Data and XML,” Proc. Second Int'l Workshop the Web and Databases (WebDB '99), in conjunction with 1999 ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '99), June 1999.
[8] A. Bonifati and S. Ceri, “Comparative Analysis of Five XML Query Languages,” ACM SIGMOD Record, vol. 29, no. 1, pp. 68-79, 2000.
[9] P. Buneman, W. Fan, and S. Weinstein, “Path Constraints on Semistructured and Structured Data,” Proc. 1998 ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '98), June 1998.
[10] S. Cluet and C. Delobel, “A General Framework for the Optimization of Object-Oriented Queries,” Proc. ACM SIGMOD Conf., pp. 383–392, June 1992.
[11] R. Elmasri and S.B. Navathe, Fundamentals of Database Systems, second ed., Benjamin/Cummings, 1994.
[12] D. Florescu, A. Levy, and A. Mendelzon, "Database Techniques for the World Wide Web: A Survey," ACM SIGMOD Record, Vol. 27, No. 3, 1998, pp. 59-74.
[13] A. Gupta and I.S. Mumick, “Maintenance of Materialized Views: Problems, Techniques and Applications,” Data Eng., vol. 18, no. 2, pp. 3-18, 1995.
[14] B.P. Jenq, D. Woelk, W. Kim, and W. Lee, “Query Processing in Distributed ORION,” Proc. Second Int'l Conf. Extending Database Technology (EDBT '90), Mar. 1990.
[15] A. Kemper and G. Moerkotte, “Access Support Relations: An Indexing Method for Object Bases,” Information Systems, vol. 17, no. 2, pp. 117-145, 1992.
[16] B. Ludäscher, Y. Papakonstantinou, and P. Velikhov, “Navigation-Driven Evaluation of Virtual Mediated Views,” Proc. Seventh Int'l Conf. Extending Database Technology (EDBT 2000), Mar. 2000.
[17] G. Mecca, P. Atzeni, A. Masci, P. Merialdo, and G. Sindoni, “The Araneus Web-Base Management System,” Proc. 1998 ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '98), June 1998.
[18] A.O. Mendelzon, G. Mihaila, and T. Milo, “Querying the World Wide Web,” J. Digital Libraries, vol. 1, no. 1, pp. 54-67, Apr. 1997.
[19] C. Mohan, D. Haderle, Y. Wang, and J. Cheng, “Single Table Access Using Multiple Indexes: Optimization, Execution, and Concurrency Control Techniques,” Proc. Second Int'l Conf. Extending Database Technology (EDBT '90), Mar. 1990.
[20] S. Navathe, “An Intuitive View to Normalize Network Structured Data,” Proc. Sixth Int'l Conf. Very Large Data Bases (VLDB '80), Oct. 1980.
[21] M.T. Özsu and J.A. Blakeley, “Query Processing in Object-Oriented Database Systems,” Modern Database Management—Object-Oriented and Multidatabase Technologies, W. Kim, ed., pp. 146-174, Addison Wesley-ACM Press, 1994.
[22] A. Rosenthal and D. Reiner,“An architecture for query optimization,” Proc. ACM SIGMOD Int’l Conf. Management of Data, pp. 246-255, 1982.
[23] M.A. Roth, H.F. Korth, and A. Silberschatz, “Extended Algebra and Calculus for Nested Relational Databases,” ACM Trans. Database Systems, vol. 13, no. 4, pp. 389–417, Dec. 1988.
[24] G.M. Shaw and S.B. Zdonik, “An Object-Oriented Query Algebra,” Proc. Second Int'l Workshop Database Programming Languages (DBPL '89), June 1989.
[25] P. Valduriez, “Join Indices,” ACM Trans. Database Systems, vol. 12, no. 2, 1987.
[26] C. Zaniolo, “Design of Relational Views over Network Schemas,” Proc. 1979 ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '79), May 1979.
[27] C. Zaniolo, “The Database Language GEM,” Proc. Ann. Meeting (SIGMOD '83), May 1983.
[28] Y. Zhuge and H. Garcia-Molina, “Graph Structured Views and their Incremental Maintenance,” Proc. 14th Int'l Conf. Data Eng. (ICDE '98), Feb. 1998.

Index Terms:
Web, query languages, query optimization, view maintenance.
Giansalvatore Mecca, Alberto O. Mendelzon, Paolo Merialdo, "Efficient Queries over Web Views," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 6, pp. 1280-1298, Nov.-Dec. 2002, doi:10.1109/TKDE.2002.1047768
Usage of this product signifies your acceptance of the Terms of Use.