|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
2010 IEEE 26th International Conference on Data Engineering (ICDE 2010)
ProbClean: A probabilistic duplicate detection system
Long Beach, CA, USA
March 01-March 06
ISBN: 978-1-4244-5445-7
| ASCII Text | x | ||
| George Beskales, Mohamed A. Soliman, Ihab F. Ilyas, Shai Ben-David, Yubin Kim, "ProbClean: A probabilistic duplicate detection system," Data Engineering, International Conference on, pp. 1193-1196, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), 2010. | |||
| BibTex | x | ||
| @article{ 10.1109/ICDE.2010.5447744, author = {George Beskales and Mohamed A. Soliman and Ihab F. Ilyas and Shai Ben-David and Yubin Kim}, title = {ProbClean: A probabilistic duplicate detection system}, journal ={Data Engineering, International Conference on}, volume = {0}, year = {2010}, isbn = {978-1-4244-5445-7}, pages = {1193-1196}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICDE.2010.5447744}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Data Engineering, International Conference on TI - ProbClean: A probabilistic duplicate detection system SN - 978-1-4244-5445-7 SP1193 EP1196 A1 - George Beskales, A1 - Mohamed A. Soliman, A1 - Ihab F. Ilyas, A1 - Shai Ben-David, A1 - Yubin Kim, PY - 2010 VL - 0 JA - Data Engineering, International Conference on ER - | |||
One of the most prominent data quality problems is the existence of duplicate records. Current data cleaning systems usually produce one clean instance (repair) of the input data, by carefully choosing the parameters of the duplicate detection algorithms. Finding the right parameter settings can be hard, and in many cases, perfect settings do not exist. We propose ProbClean, a system that treats duplicate detection procedures as data processing tasks with uncertain outcomes. We use a novel uncertainty model that compactly encodes the space of possible repairs corresponding to different parameter settings. ProbClean efficiently supports relational queries and allows new types of queries against a set of possible repairs.
Citation:
George Beskales, Mohamed A. Soliman, Ihab F. Ilyas, Shai Ben-David, Yubin Kim, "ProbClean: A probabilistic duplicate detection system," icde, pp.1193-1196, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), 2010
Usage of this product signifies your acceptance of the Terms of Use.
