Issue No. 05 - September/October (2003 vol. 18)
AnHai Doan , University of Illinois, Urbana-Champaign
Ying Lu , University of Illinois, Urbana-Champaign
Yoonkyong Lee , University of Illinois, Urbana-Champaign
Jiawei Han , University of Illinois, Urbana-Champaign
<p>Object matching is a fundamental problem that arises in numerous information integration scenarios. Virtually all existing solutions assume that the objects to be matched share the same attribute set and that systems can match them by comparing attribute similarities. Our work addresses the more general problem in which objects also have disjoint attributes-for example, matching tuples from relational tables that have different schemas, such as (age, name) and (name, salary). Profile-Based Object Matching, which applies this idea, exploits disjoint attributes to improve matching accuracy. PROM first matches any two tuples based on a shared attribute, such as name. It then applies a set of profilers, each of which contains some knowledge about what constitutes a typical person. The profilers examine the tuple pair to see if it plausibly describes a person. A profiler might state, for example, that if the pair produces a person with an age of 6 and a salary of $100,000, the pair doesn't describe a real person, so the tuples don't match. Profilers can be manually specified by domain experts, trained on training data, transferred from other matching tasks, or built from external data. PROM is thus distinct in that it not only exploits disjoint attributes to improve matching accuracy but also facilitates knowledge reuse from previous object-matching tasks.</p>
object matching, tuple deduplication, record linkage, data cleaning, data integration
Y. Lu, Y. Lee, J. Han and A. Doan, "Profile-Based Object Matching for Information Integration," in IEEE Intelligent Systems, vol. 18, no. , pp. 54-59, 2003.