loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
21st International Conference on Data Engineering (ICDE'05)
Schema Matching Using Duplicates
Tokyo, Japan
April 05-April 08
ISBN: 0-7695-2285-8
Alexander Bilke, Technische Universität Berlin
Felix Naumann, Humboldt-Universität zu Berlin

Most data integration applications require a matching between the schemas of the respective data sets. We show how the existence of duplicates within these data sets can be exploited to automatically identify matching attributes. We describe an algorithm that first discovers duplicates among data sets with unaligned schemas and then uses these duplicates to perform schema matching between schemas with opaque column names.

Discovering duplicates among data sets with unaligned schemas is more difficult than in the usual setting, because it is not clear which fields in one object should be compared with which fields in the other. We have developed a new algorithm that efficiently finds the most likely duplicates in such a setting. Now, our schema matching algorithm is able to identify corresponding attributes by comparing data values within those duplicate records. An experimental study on real-world data shows the effectiveness of this approach.

Citation:
Alexander Bilke, Felix Naumann, "Schema Matching Using Duplicates," icde, pp.69-80, 21st International Conference on Data Engineering (ICDE'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.