The Community for Technology Leaders
RSS Icon
Subscribe
Long Beach, CA, USA
Mar. 1, 2010 to Mar. 6, 2010
ISBN: 978-1-4244-5445-7
pp: 1193-1196
George Beskales , School of Computer Science, University of Waterloo, Canada
Mohamed A. Soliman , School of Computer Science, University of Waterloo, Canada
Ihab F. Ilyas , School of Computer Science, University of Waterloo, Canada
Shai Ben-David , School of Computer Science, University of Waterloo, Canada
Yubin Kim , School of Computer Science, University of Waterloo, Canada
ABSTRACT
One of the most prominent data quality problems is the existence of duplicate records. Current data cleaning systems usually produce one clean instance (repair) of the input data, by carefully choosing the parameters of the duplicate detection algorithms. Finding the right parameter settings can be hard, and in many cases, perfect settings do not exist. We propose ProbClean, a system that treats duplicate detection procedures as data processing tasks with uncertain outcomes. We use a novel uncertainty model that compactly encodes the space of possible repairs corresponding to different parameter settings. ProbClean efficiently supports relational queries and allows new types of queries against a set of possible repairs.
CITATION
George Beskales, Mohamed A. Soliman, Ihab F. Ilyas, Shai Ben-David, Yubin Kim, "ProbClean: A probabilistic duplicate detection system", ICDE, 2010, 2013 IEEE 29th International Conference on Data Engineering (ICDE), 2013 IEEE 29th International Conference on Data Engineering (ICDE) 2010, pp. 1193-1196, doi:10.1109/ICDE.2010.5447744
29 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool