loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
22nd International Conference on Data Engineering (ICDE'06)
Detecting Duplicates in Complex XML Data
Atlanta, Georgia
April 03-April 07
ISBN: 0-7695-2570-9
Melanie Weis, Humboldt-Universitat zu Berlin
Felix Naumann, Humboldt-Universitat zu Berlin
Recent work both in the relational and the XML world have shown that the efficacy and efficiency of duplicate detection is enhanced by regarding relationships between entities. However, most approaches for XML data rely on 1:n parent/child relationships, and do not apply to XML data that represents m:n relationships.

We present a novel comparison strategy, which performs duplicate detection effectively for all kinds of parent/child relationships, given dependencies between different XML elements. Due to cyclic dependencies, it is possible that a pairwise classification is performed more than once, which compromises efficiency. We propose an order that reduces the number of such reclassifications and apply it to two algorithms. The first algorithm performs reclassifications, and efficiency is increased by using the order reducing the number of reclassifications. The second algorithm does not perform a comparison more than once, and the order is used to miss few reclassifications and hence few potential duplicates.

Citation:
Melanie Weis, Felix Naumann, "Detecting Duplicates in Complex XML Data," icde, pp.109, 22nd International Conference on Data Engineering (ICDE'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.