|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Sudipto Guha, Nick Koudas, Divesh Srivastava, Xiaohui Yu, "Reasoning About Approximate Match Query Results," Data Engineering, International Conference on, pp. 8, 22nd International Conference on Data Engineering (ICDE'06), 2006. | |||
| BibTex | x | ||
| @article{ 10.1109/ICDE.2006.128, author = {Sudipto Guha and Nick Koudas and Divesh Srivastava and Xiaohui Yu}, title = {Reasoning About Approximate Match Query Results}, journal ={Data Engineering, International Conference on}, volume = {0}, year = {2006}, isbn = {0-7695-2570-9}, pages = {8}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICDE.2006.128}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Data Engineering, International Conference on TI - Reasoning About Approximate Match Query Results SN - 0-7695-2570-9 SP EP A1 - Sudipto Guha, A1 - Nick Koudas, A1 - Divesh Srivastava, A1 - Xiaohui Yu, PY - 2006 KW - null VL - 0 JA - Data Engineering, International Conference on ER - | |||
In this paper, we consider the problem of estimating various parameters on the output of declarative approximate join algorithms for planning purposes. Such algorithms are highly time consuming, so precise knowledge of the result size as well as its score distribution is a pressing concern. This knowledge aids decisions as to which operations are more promising for identifying highly similar tuples, which is a key operation for data cleaning. We propose solution strategies that fully comply with a declarative framework and analytically reason about the quality of the estimates we obtain as well as the performance of our strategies.
We present the results of a detailed performance evaluation of all strategies proposed. Our experimental results validate our analytical expectations and shed additional light on the quality and performance of our estimation framework. Our study offers a set of simple, fully declarative techniques for this problem, which can be readily deployed in data cleaning systems.
