The Community for Technology Leaders
Green Image
Issue No. 05 - September/October (2003 vol. 18)
ISSN: 1541-1672
pp: 54-59
Ying Lu , University of Illinois, Urbana-Champaign
Yoonkyong Lee , University of Illinois, Urbana-Champaign
Jiawei Han , University of Illinois, Urbana-Champaign
AnHai Doan , University of Illinois, Urbana-Champaign
<p>Object matching is a fundamental problem that arises in numerous information integration scenarios. Virtually all existing solutions assume that the objects to be matched share the same attribute set and that systems can match them by comparing attribute similarities. Our work addresses the more general problem in which objects also have disjoint attributes-for example, matching tuples from relational tables that have different schemas, such as (age, name) and (name, salary). Profile-Based Object Matching, which applies this idea, exploits disjoint attributes to improve matching accuracy. PROM first matches any two tuples based on a shared attribute, such as name. It then applies a set of profilers, each of which contains some knowledge about what constitutes a typical person. The profilers examine the tuple pair to see if it plausibly describes a person. A profiler might state, for example, that if the pair produces a person with an age of 6 and a salary of $100,000, the pair doesn't describe a real person, so the tuples don't match. Profilers can be manually specified by domain experts, trained on training data, transferred from other matching tasks, or built from external data. PROM is thus distinct in that it not only exploits disjoint attributes to improve matching accuracy but also facilitates knowledge reuse from previous object-matching tasks.</p>
object matching, tuple deduplication, record linkage, data cleaning, data integration
Ying Lu, Yoonkyong Lee, Jiawei Han, AnHai Doan, "Profile-Based Object Matching for Information Integration", IEEE Intelligent Systems, vol. 18, no. , pp. 54-59, September/October 2003, doi:10.1109/MIS.2003.1234770
103 ms
(Ver 3.3 (11022016))