loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
17th International Conference on Data Engineering (ICDE'01)
An Automated Change-Detection Algorithm for HTML Documents Based on Semantic Hierarchies
Heidelberg, Germany
April 02-April 06
ISBN: 0-7695-1001-9
Seung-Jin Lim, Brigham Young University
Yiu-Kai Ng, Brigham Young University
Abstract: Data at many Web sites are changing rapidly, and a significant amount of these data are presented in HTML documents that consist of markups and data contents. Although XML is getting more popular in data exchange, the presentation of data contained in XML documents is given by and large in the HTML format using XSL(T). Since HTML was designed to "display" data from the human perspective, it is not trivial for a machine to detect (hierarchical) changes of data in an HTML document. In this paper, we propose a heuristic algorithm, called SCD, to detect semantic changes of hierarchical data contents in any two HTML documents automatically. Semantic changes differ from syntactic changes since the latter refer to changes of data contents with respect to markup structures according to the HTML grammar. SCD does not require preprocessing nor any knowledge of the internal structure of the source documents beforehand. The time complexity of SCD is O((\mid X \mid \times \mid Y\mid) log(\mid X\mid \times \mid Y\mid)), where \mid X \mid and \mid Y \mid are the number of unique branches in the syntactic hierarchies of any two given HTML documents, respectively.
Citation:
Seung-Jin Lim, Yiu-Kai Ng, "An Automated Change-Detection Algorithm for HTML Documents Based on Semantic Hierarchies," icde, pp.0303, 17th International Conference on Data Engineering (ICDE'01), 2001
Usage of this product signifies your acceptance of the Terms of Use.