|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
21st International Conference on Data Engineering (ICDE'05)
Robust Identification of Fuzzy Duplicates
Tokyo, Japan
April 05-April 08
ISBN: 0-7695-2285-8
| ASCII Text | x | ||
| Surajit Chaudhuri, Venkatesh Ganti, Rajeev Motwani, "Robust Identification of Fuzzy Duplicates," Data Engineering, International Conference on, pp. 865-876, 21st International Conference on Data Engineering (ICDE'05), 2005. | |||
| BibTex | x | ||
| @article{ 10.1109/ICDE.2005.125, author = {Surajit Chaudhuri and Venkatesh Ganti and Rajeev Motwani}, title = {Robust Identification of Fuzzy Duplicates}, journal ={Data Engineering, International Conference on}, volume = {0}, year = {2005}, issn = {1084-4627}, pages = {865-876}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICDE.2005.125}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Data Engineering, International Conference on TI - Robust Identification of Fuzzy Duplicates SN - 1084-4627 SP865 EP876 A1 - Surajit Chaudhuri, A1 - Venkatesh Ganti, A1 - Rajeev Motwani, PY - 2005 KW - null VL - 0 JA - Data Engineering, International Conference on ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDE.2005.125
Detecting and eliminating fuzzy duplicates is a critical data cleaning task that is required by many applications. Fuzzy duplicates are multiple seemingly distinct tuples which represent the same real-world entity. We propose two novel criteria that enable characterization of fuzzy duplicates more accurately than is possible with existing techniques. Using these criteria, we propose a novel framework for the fuzzy duplicate elimination problem. We show that solutions within the new framework result in better accuracy than earlier approaches. We present an efficient algorithm for solving instantiations within the framework. We evaluate it on real datasets to demonstrate the accuracy and scalability of our algorithm.
Citation:
Surajit Chaudhuri, Venkatesh Ganti, Rajeev Motwani, "Robust Identification of Fuzzy Duplicates," icde, pp.865-876, 21st International Conference on Data Engineering (ICDE'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.
