2017 IEEE 33rd International Conference on Data Engineering (2017)
San Diego, California, USA
April 19, 2017 to April 22, 2017
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDE.2017.141
We study the data cleaning problem of detecting and repairing wrong relational data, as well as marking correct data, using well curated knowledge bases (KBs). We propose detective rules (DRs), a new type of data cleaning rules that can make actionable decisions on relational data, by building connections between a relation and a KB. The main invention is that, a DR simultaneously models two opposite semantics of a relation using types and relationships in a KB: the positive semantics that explains how attribute values are linked to each other in correct tuples, and the negative semantics that indicates how wrong attribute values are connected to other correct attribute values within the same tuples. Naturally, a DR can mark correct values in a tuple if it matches the positive semantics. Meanwhile, a DR can detect/repair an error if it matches the negative semantics. We study fundamental problems associated with DRs, e.g., rule generation and rule consistency. We present efficient algorithms to apply DRs to clean a relation, based on rule order selection and inverted indexes. Extensive experiments, using both real-world and synthetic datasets, verify the effectiveness and efficiency of applying DRs in practice.
Cleaning, Semantics, Chemistry, Urban areas, Maintenance engineering, Knowledge based systems, Integrated circuits
S. Hao, N. Tang, G. Li and J. Li, "Cleaning Relations Using Knowledge Bases," 2017 IEEE 33rd International Conference on Data Engineering(ICDE), San Diego, California, USA, 2017, pp. 933-944.