|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
22nd International Conference on Data Engineering Workshops (ICDEW'06)
Efficiently Computing Inclusion Dependencies for Schema Discovery
Atlanta, Georgia
April 03-April 07
ISBN: 0-7695-2571-7
| ASCII Text | x | ||
| Jana Bauckmann, Ulf Leser, Felix Naumann, "Efficiently Computing Inclusion Dependencies for Schema Discovery," Data Engineering Workshops, 22nd International Conference on, pp. 2, 22nd International Conference on Data Engineering Workshops (ICDEW'06), 2006. | |||
| BibTex | x | ||
| @article{ 10.1109/ICDEW.2006.54, author = {Jana Bauckmann and Ulf Leser and Felix Naumann}, title = {Efficiently Computing Inclusion Dependencies for Schema Discovery}, journal ={Data Engineering Workshops, 22nd International Conference on}, volume = {0}, year = {2006}, isbn = {0-7695-2571-7}, pages = {2}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICDEW.2006.54}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Data Engineering Workshops, 22nd International Conference on TI - Efficiently Computing Inclusion Dependencies for Schema Discovery SN - 0-7695-2571-7 SP EP A1 - Jana Bauckmann, A1 - Ulf Leser, A1 - Felix Naumann, PY - 2006 KW - null VL - 0 JA - Data Engineering Workshops, 22nd International Conference on ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDEW.2006.54
Large data integration projects must often cope with undocumented data sources. Schema discovery aims at automatically finding structures in such cases. An important class of relationships between attributes that can be detected automatically are inclusion dependencies (IND), which provide an excellent basis for guessing foreign key constraints. INDs can be discovered by comparing the sets of distinct values of pairs of attributes.
In this paper we present efficient algorithms for finding unary INDs. We first show that (and why) SQL is not suitable for this task. We then develop two algorithms that compute inclusion dependencies outside of the database. Both are much faster than the SQL-based methods; in fact, for larger schemas they are the only feasible solution. Our experiments show that we can compute all unary INDs in a schema of 1, 680 attributes with a total database size of 3.2 GB in approximately 2.5 hours.
Citation:
Jana Bauckmann, Ulf Leser, Felix Naumann, "Efficiently Computing Inclusion Dependencies for Schema Discovery," icdew, pp.2, 22nd International Conference on Data Engineering Workshops (ICDEW'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.
