loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fifth IEEE International Conference on Data Mining (ICDM'05)
A Heterogeneous Field Matching Method for Record Linkage
Houston, Texas
November 27-November 30
ISBN: 0-7695-2278-5
Steven N. Minton, Fetch Technologies
Claude Nanjo, Fetch Technologies
Craig A. Knoblock, University of Southern California
Martin Michalowski, University of Southern California
Matthew Michelson, University of Southern California
Record linkage is the process of determining that two records refer to the same entity. A key subprocess is evaluating how well the individual fields, or attributes, of the records match each other. One approach to matching fields is to use hand-written domain-specific rules. This "expert systems" approach may result in good performance for specific applications, but it is not scalable. This paper describes a new machine learning approach that creates expert-like rules for field matching. In our approach, the relationship between two field values is described by a set of heterogeneous transformations. Previous machine learning methods used simple models to evaluate the distance between two fields. However, our approach enables more sophisticated relationships to be modeled, which better capture the complex domain specific, common-sense phenomena that humans use to judge similarity. We compare our approach to methods that rely on simpler homogeneous models in several domains. By modeling more complex relationships we produce more accurate results.
Citation:
Steven N. Minton, Claude Nanjo, Craig A. Knoblock, Martin Michalowski, Matthew Michelson, "A Heterogeneous Field Matching Method for Record Linkage," icdm, pp.314-321, Fifth IEEE International Conference on Data Mining (ICDM'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.