2013 IEEE 29th International Conference on Data Engineering (ICDE) (2012)
Arlington, Virginia USA
Apr. 1, 2012 to Apr. 5, 2012
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDE.2012.15
Uncertainties in data can arise for a number of reasons: when data is incomplete, contains conflicting information or has been deliberately perturbed or coarsened to remove sensitive details. An important case which arises in many real applications is when the data describes a set of possibilities, but with cardinality constraints. These constraints represent correlations between tuples encoding, e.g. that at most two possible records are correct, or that there is an (unknown) one-to-one mapping between a set of tuples and attribute values. Although there has been much effort to handle uncertain data, current systems are not equipped to handle such correlations, beyond simple mutual exclusion and co-existence constraints. Vitally, they have little support for efficiently handling aggregate queries on such data. In this paper, we aim to address some of these deficiencies, by introducing LICM (Linear Integer Constraint Model), which can succinctly represent many types of tuple correlations, particularly a class of cardinality constraints. We motivate and explain the model with examples from data cleaning and masking sensitive data, to show that it enables modeling and querying such data, which was not previously possible. We develop an efficient strategy to answer conjunctive and aggregate queries on possibilistic data by describing how to implement relational operators over data in the model. LICM compactly integrates the encoding of correlations, query answering and lineage recording. In combination with off-the-shelf linear integer programming solvers, our approach provides exact bounds for aggregate queries. Our prototype implementation demonstrates that query answering with LICM can be effective and scalable.
Graham Cormode, Entong Shen, Divesh Srivastava, Ting Yu, "Aggregate Query Answering on Possibilistic Data with Cardinality Constraints", 2013 IEEE 29th International Conference on Data Engineering (ICDE), vol. 00, no. , pp. 258-269, 2012, doi:10.1109/ICDE.2012.15