This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Detecting and Representing Relevant Web Deltas in WHOWEDA
March/April 2003 (vol. 15 no. 2)
pp. 423-441
Sourav S. Bhowmick, IEEE Computer Society
Wee Keong Ng, IEEE Computer Society

Abstract—In this paper, we present a mechanism for detecting and representing changes, given the old and new versions of a set of interlinked Web documents, retrieved in response to a user's query. In particular, we show how to detect and represent Web deltas, i.e., changes in the Web documents that are relevant to a user's query in the context of our Web warehousing system called Whoweda (Warehouse of Web Data). In Whoweda, Web information is materialized views stored in Web tables in the form of Web tuples. These Web tuples, represented as directed graphs, can be manipulated using a set of Web algebraic operators. In this paper, we present a mechanism to detect relevant Web deltas using Web algebraic operators such as the Web join and the outer Web join. Web join is used to detect identical documents residing in two Web tables, whereas, outer Web join, a derivative of Web join, is used to identify dangling Web tuples. We show how to represent these changes using delta Web tables. We develop formal algorithms for the generation of delta Web tables identifying Web documents which have been added, deleted, or modified since the last query.

[1] URL-Minder Web site.http://www.netmind.com/URL-minderURL-minder.html .
[2] S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Weiner, “The Lorel Query Language for Semistructured Data,” J. Digital Libraries, vol. 1, no. 1, pp. 68-88, Apr. 1997.
[3] S. Bhowmick, S.K. Madria, W.-K. Ng, and E.-P. Lim, “Detecting and Representing Relevant Web Deltas Using Web Join,” Proc. 20th Int'l Conf. Distributed Computing Systems (ICDCS '00), 2000.
[4] S.S. Bhowmick, “WHOM: A Data Model and Algebra for a Web Warehouse,” PhD Dissertation, School of Computer Eng., Nanyang Technological Univ., Singapore, available atwww.ntu.edu.sg/homeassourav/, 2001.
[5] S.S. Bhowmick, W.-K. Ng, and S. Madria, “Anatomy of a Coupling Query in a Web Warehouse,” Proc. Int'l J. Software and Information Technology, Elsevier Science, 2002.
[6] S.S. Bhowmick, W.-K. Ng, and S.K. Madria, “Schemas for Web Data: A Reverse Engineering Approach,” Data and Knowledge Eng. J. (DKE), vol. 39, no. 2, pp. 105-142, Elsevier Science, 2001.
[7] S. Bhowmick, S.K. Madria, W.-K. Ng, and E.-P. Lim, “Web Warehousing: Design and Issues,” Proc. Int'l Workshop Data Warehousing and Data Mining (DWDM '98) (in conjunction with ER '98), 1998.
[8] S. Bhowmick, W.-K. Ng, and E.-P. Lim, “Information Coupling in Web Databases,” Proc. 17th Int'l Conf. Conceptual Modeling (ER '98), 1998.
[9] T. Bray, J. Paoli, and C. Sperberg-McQueen, “Extensible Markup Language (XML) 1.0.,” W3C Recommendation available athttp://www.w3.org/TR/1998REC-xml-19980210 , Feb. 1998.
[10] S. Chawathy, S. Abiteboul, and J. Widom, “Representing and Querying Changes in Semistructured Data,” Proc. IEEE 14th Int'l Conf. Data Eng., pp. 4-13, Feb. 1998.
[11] Y-F. Chen, F. Douglis, H. Huang, and K. Vo, “TopBlend: An Efficient Implementation of HtmlDiff in Java,” AT&T Labs—Research Technical Report, 00.5.1, available athttp://www.research.att.com/chentopblend /, Jan. 2000.
[12] Y.-F. Chen, G.S. Fowler, E. Koutsofios, and R.S. Wallach, "CIAO: A Graphical Navigator for Software and Document Repositories," Int'l Conf. Software Maintenance, pp. 66-75, 1995.
[13] S. Chawathe and H. Garcia-Molina, “Meaningful Change Detection in Structured Data,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 22-32, 1997.
[14] S. Chawathe, A. Rajaraman, H. Garcia-Molina, and J. Widom, “Change Detection in Hierarchically Structured Information,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 34-43, 1996.
[15] G. Cobéna, S. Abiteboul, and A. Marian, Detecting Changes in XML Documents Proc. 18th Int'l Conf. Data Eng. (ICDE'02), pp. 41-52, Feb. 2002.
[16] F. Douglis, T. Ball, Y.-F. Chen, and E. Koutsofios, “The AT&T Internet Difference Engine: Tracking and Viewing Changes on the Web,” World Wide Web J., vol. 1, no. 1, pp. 27-44, Jan. 1998.
[17] F. Douglis, T. Ball, Y.-F. Chen, and E. Koutsofios, “WebGUIDE: Querying and Navigating Changes in Web Repositories,” Proc. Fifth Int'l World Wide Web Conf., May 1996.
[18] D.S. Hirschberg, “Algorithms for the Longest Common Sequence Problem,” J. ACM, vol. 24, no. 4, pp. 664-675, Oct. 1977.
[19] G. Jacobson and K.-P. Vo, “Heaviest Increasing/Common Subsequence Problems,” Proc. Third Ann. Symp. Combinatorial Pattern Matching, vol. 64,Springer-Verlag, pp. 52-65, 1992.
[20] L. Liu, C. Pu, and W. Tang, “WebCQ—Detecting and Delivering Information Changes on the Web,” Proc. Int'l Conf. Information and Knowledge Management (CIKM '00), Nov. 2000.
[21] L. Liu, C. Pu, and W. Tang, “Continual Queries for Internet Scale Event-Driven Information Delivery,” IEEE Trans. Knowledge and Data Eng., July/Aug. 1999.
[22] A.K. Luah, W.-K. Ng, and E.-P. Lim, “Locating Web Information Using Web Checkpoints,” Proc. Int'l Workshop Internet Data Management (IDM '99), Aug. 1999.
[23] A. O. Mendelzon, G. A. Mihaila, and T. Milo, "Querying the World Wide Web," Proc. Conf. Parallel and Distributed Information Systems (PDIS), 1996, pp. 80-91.
[24] I. Mani and E. Bloedorn, “Multi-Document Summarization by Graph Search and Matching.” available athttp://www.mitre.org/support/papers/abstracts multi_summariz.shtml.
[25] D. Shasha and K. Zhang, “Fast Algorithms for the Unit Cost Editing Distance between Trees,” J. Algorithms, vol. 11, no. 4, pp. 581-621, 1990.
[26] J. Widom and S. Ceri, Active Database Systems: Triggers and Rules for Advanced Database Processing. San Fransisco: Morgan Kaufmann, 1995.
[27] J.T.L. Wang, K. Zhang, and G.-W. Chirn, “Algorithms for Approximate Graph Matching,” Information Sciences, vol. 82, pp. 45-74, 1995.
[28] J.T.L. Wang, B.A. Shapiro, D. Shasha, K. Zhang, and K.M. Currey, “An Algorithm for Finding the Largest Approximately Common Substructures of Two Trees,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 889-895, 1998.
[29] C. Yinyan, E.P. Lim, and W.K. Ng, “Storage Management of a Historical Web Warehousing System,” Proc. 11th Int'l Conf. Database and Expert System Applications (DEXA '00), pp. 457-466, Sept. 2000.
[30] C. Yinyan, “Querying Historical Web Information,” master's dissertation, School of Computer Eng., Nanyang Technological Univ., 2000.

Index Terms:
Web deltas, Web warehouse, Web join, outer Web join, delta Web tables, algorithm.
Citation:
Sourav S. Bhowmick, Sanjay Kumar Madria, Wee Keong Ng, "Detecting and Representing Relevant Web Deltas in WHOWEDA," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 2, pp. 423-441, March-April 2003, doi:10.1109/TKDE.2003.1185843
Usage of this product signifies your acceptance of the Terms of Use.