|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
Eliminating Duplicates in Information Integration: An Adaptive, Extensible Framework
September/October 2006 (vol. 21 no. 5)
pp. 63-71
| ASCII Text | x | ||
| Hamid Haidarian Shahri, Saied Haidarian Shahri, "Eliminating Duplicates in Information Integration: An Adaptive, Extensible Framework," IEEE Intelligent Systems, vol. 21, no. 5, pp. 63-71, September/October, 2006. | |||
| BibTex | x | ||
| @article{ 10.1109/MIS.2006.90, author = {Hamid Haidarian Shahri and Saied Haidarian Shahri}, title = {Eliminating Duplicates in Information Integration: An Adaptive, Extensible Framework}, journal ={IEEE Intelligent Systems}, volume = {21}, number = {5}, issn = {1541-1672}, year = {2006}, pages = {63-71}, doi = {http://doi.ieeecomputersociety.org/10.1109/MIS.2006.90}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - MGZN JO - IEEE Intelligent Systems TI - Eliminating Duplicates in Information Integration: An Adaptive, Extensible Framework IS - 5 SN - 1541-1672 SP63 EP71 EPD - 63-71 A1 - Hamid Haidarian Shahri, A1 - Saied Haidarian Shahri, PY - 2006 KW - database applications KW - data mining KW - knowledge management applications KW - uncertainty KW - fuzzy and probabilistic reasoning KW - data warehouse and repository VL - 21 JA - IEEE Intelligent Systems ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MIS.2006.90
Approximate duplicate elimination is an important data-integration task, but its complex comparisons of many records involving uncertainty and ambiguity make it difficult. Earlier approaches required a time-consuming and tedious process of hard coding of static rules based on a schema. A novel duplicate-elimination framework now lets users clean data flexibly and effortlessly, without any coding. Exploiting fuzzy inference inherently handles the problem's uncertainty, and unique machine learning capabilities let the framework adapt to the specific notion of similarity appropriate for each domain. The framework is extensible and accommodative, letting the user operate with or without training data. Additionally, many of the previous methods for duplicate elimination can be implemented quickly using this framework.
Index Terms:
database applications, data mining, knowledge management applications, uncertainty, fuzzy and probabilistic reasoning, data warehouse and repository
Citation:
Hamid Haidarian Shahri, Saied Haidarian Shahri, "Eliminating Duplicates in Information Integration: An Adaptive, Extensible Framework," IEEE Intelligent Systems, vol. 21, no. 5, pp. 63-71, Sept.-Oct. 2006, doi:10.1109/MIS.2006.90
Usage of this product signifies your acceptance of the Terms of Use.

