|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| AnHai Doan, Ying Lu, Yoonkyong Lee, Jiawei Han, "Profile-Based Object Matching for Information Integration," IEEE Intelligent Systems, vol. 18, no. 5, pp. 54-59, September/October, 2003. | |||
| BibTex | x | ||
| @article{ 10.1109/MIS.2003.1234770, author = {AnHai Doan and Ying Lu and Yoonkyong Lee and Jiawei Han}, title = {Profile-Based Object Matching for Information Integration}, journal ={IEEE Intelligent Systems}, volume = {18}, number = {5}, issn = {1541-1672}, year = {2003}, pages = {54-59}, doi = {http://doi.ieeecomputersociety.org/10.1109/MIS.2003.1234770}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - MGZN JO - IEEE Intelligent Systems TI - Profile-Based Object Matching for Information Integration IS - 5 SN - 1541-1672 SP54 EP59 EPD - 54-59 A1 - AnHai Doan, A1 - Ying Lu, A1 - Yoonkyong Lee, A1 - Jiawei Han, PY - 2003 KW - object matching KW - tuple deduplication KW - record linkage KW - data cleaning KW - data integration VL - 18 JA - IEEE Intelligent Systems ER - | |||
Object matching is a fundamental problem that arises in numerous information integration scenarios. Virtually all existing solutions assume that the objects to be matched share the same attribute set and that systems can match them by comparing attribute similarities. Our work addresses the more general problem in which objects also have disjoint attributes-for example, matching tuples from relational tables that have different schemas, such as (age, name) and (name, salary). Profile-Based Object Matching, which applies this idea, exploits disjoint attributes to improve matching accuracy. PROM first matches any two tuples based on a shared attribute, such as name. It then applies a set of profilers, each of which contains some knowledge about what constitutes a typical person. The profilers examine the tuple pair to see if it plausibly describes a person. A profiler might state, for example, that if the pair produces a person with an age of 6 and a salary of $100,000, the pair doesn't describe a real person, so the tuples don't match. Profilers can be manually specified by domain experts, trained on training data, transferred from other matching tasks, or built from external data. PROM is thus distinct in that it not only exploits disjoint attributes to improve matching accuracy but also facilitates knowledge reuse from previous object-matching tasks.

