This Article 
 Bibliographic References 
 Add to: 
Profile-Based Object Matching for Information Integration
September/October 2003 (vol. 18 no. 5)
pp. 54-59
AnHai Doan, University of Illinois, Urbana-Champaign
Ying Lu, University of Illinois, Urbana-Champaign
Yoonkyong Lee, University of Illinois, Urbana-Champaign
Jiawei Han, University of Illinois, Urbana-Champaign

Object matching is a fundamental problem that arises in numerous information integration scenarios. Virtually all existing solutions assume that the objects to be matched share the same attribute set and that systems can match them by comparing attribute similarities. Our work addresses the more general problem in which objects also have disjoint attributes-for example, matching tuples from relational tables that have different schemas, such as (age, name) and (name, salary). Profile-Based Object Matching, which applies this idea, exploits disjoint attributes to improve matching accuracy. PROM first matches any two tuples based on a shared attribute, such as name. It then applies a set of profilers, each of which contains some knowledge about what constitutes a typical person. The profilers examine the tuple pair to see if it plausibly describes a person. A profiler might state, for example, that if the pair produces a person with an age of 6 and a salary of $100,000, the pair doesn't describe a real person, so the tuples don't match. Profilers can be manually specified by domain experts, trained on training data, transferred from other matching tasks, or built from external data. PROM is thus distinct in that it not only exploits disjoint attributes to improve matching accuracy but also facilitates knowledge reuse from previous object-matching tasks.

Index Terms:
object matching, tuple deduplication, record linkage, data cleaning, data integration
AnHai Doan, Ying Lu, Yoonkyong Lee, Jiawei Han, "Profile-Based Object Matching for Information Integration," IEEE Intelligent Systems, vol. 18, no. 5, pp. 54-59, Sept.-Oct. 2003, doi:10.1109/MIS.2003.1234770
Usage of this product signifies your acceptance of the Terms of Use.