This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
17th International Conference on Database and Expert Systems Applications (DEXA'06)
Finding Syntactic Similarities Between XML Documents
Krakow, Poland
September 04-September 08
ISBN: 0-7695-2641-1
Davood Rafiei, University of Alberta, Canada
Daniel L. Moise, University of Alberta, Canada
Dabo Sun, University of Alberta, Canada
Detecting structural similarities between XML documents has been the subject of several recent work, and the proposed algorithms mostly use tree edit distance between the corresponding trees of XML documents. However, evaluating a tree edit distance is computationally expensive and does not easily scale up to large collections. We show in this paper that a tree edit distance computation often is not necessary and can be avoided. In particular, we propose a concise structural summary of XML documents and show that a comparison based on this summary is both fast and effective. Our experimental evaluation shows that this method does an excellent job of grouping documents generated by the same DTD, outperforming some of the previously proposed solutions based on a tree comparison. Furthermore, the time complexity of the algorithm is linear on the size of the structural description.
Citation:
Davood Rafiei, Daniel L. Moise, Dabo Sun, "Finding Syntactic Similarities Between XML Documents," dexa, pp.512-516, 17th International Conference on Database and Expert Systems Applications (DEXA'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.