This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Eliminating Duplicates in Information Integration: An Adaptive, Extensible Framework
September/October 2006 (vol. 21 no. 5)
pp. 63-71
Hamid Haidarian Shahri, University of Maryland
Saied Haidarian Shahri, University of Tehran
Approximate duplicate elimination is an important data-integration task, but its complex comparisons of many records involving uncertainty and ambiguity make it difficult. Earlier approaches required a time-consuming and tedious process of hard coding of static rules based on a schema. A novel duplicate-elimination framework now lets users clean data flexibly and effortlessly, without any coding. Exploiting fuzzy inference inherently handles the problem's uncertainty, and unique machine learning capabilities let the framework adapt to the specific notion of similarity appropriate for each domain. The framework is extensible and accommodative, letting the user operate with or without training data. Additionally, many of the previous methods for duplicate elimination can be implemented quickly using this framework.
Index Terms:
database applications, data mining, knowledge management applications, uncertainty, fuzzy and probabilistic reasoning, data warehouse and repository
Citation:
Hamid Haidarian Shahri, Saied Haidarian Shahri, "Eliminating Duplicates in Information Integration: An Adaptive, Extensible Framework," IEEE Intelligent Systems, vol. 21, no. 5, pp. 63-71, Sept.-Oct. 2006, doi:10.1109/MIS.2006.90
Usage of this product signifies your acceptance of the Terms of Use.