loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
International Workshop on Challenges in Web Information Retrieval and Integration
A Fast Linkage Detection Scheme for Multi-Source Information Integration
Tokyo, Japan
April 08-April 09
ISBN: 0-7695-2414-1
Akiko Aizawa, National Intsitute of Informatics, The Graduate University for Advanced Studies Hitotsubashi, Chiyoda-ku, Japan
Keizo Oyama, National Intsitute of Informatics, The Graduate University for Advanced Studies Hitotsubashi, Chiyoda-ku, Japan

Record linkage refers to techniques for identifying records associated with the same real-world entities. Record linkage is not only crucial in integrating multi-source databases that have been generated independently, but is also considered to be one of the key issues in integrating heterogeneous Web resources. However, when targeting large-scale data, the cost of enumerating all the possible linkages often becomes impracticably high. Based on this background, this paper proposes a fast and efficient method for linkage detection. The features of the proposed approach are: first, it exploits a suffix array structure that enables linkage detection using variable length n-grams. Second, it dynamically generates blocks of possibly associated records using ?blocking keys? extracted from already known reliable linkages. The results from our preliminary experiments where the proposed method was applied to the integration of four bibliographic databases, which scale up to more than 10 million records, are also reported in the paper.

Citation:
Akiko Aizawa, Keizo Oyama, "A Fast Linkage Detection Scheme for Multi-Source Information Integration," wiri, pp.30-39, International Workshop on Challenges in Web Information Retrieval and Integration, 2005
Usage of this product signifies your acceptance of the Terms of Use.