Issue No. 06 - June (2014 vol. 26)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2012.171
Lukasz Golab , Department of Engineering, University of Waterloo, Waterloo, ON, Canada
Howard Karloff , , AT&T Labs-Research, Florham Park, NJ, USA
Flip Korn , , AT&T Labs-Research, Florham Park, NJ, USA
Barna Saha , , AT&T Labs-Research, Florham Park, NJ, USA
Divesh Srivastava , , AT&T Labs-Research, Florham Park, NJ, USA
Many applications process data in which there exists a “conservation law” between related quantities. For example, in traffic monitoring, every incoming event, such as a packet's entering a router or a car's entering an intersection, should ideally have an immediate outgoing counterpart. We propose a new class of constraints-Conservation Rules-that express the semantics and characterize the data quality of such applications. We give confidence metrics that quantify how strongly a conservation rule holds and present approximation algorithms (with error guarantees) for the problem of discovering a concise summary of subsets of the data that satisfy a given conservation rule. Using real data, we demonstrate the utility of conservation rules and we show order-of-magnitude performance improvements of our discovery algorithms over naive approaches.
data mining, approximation theory
L. Golab, H. Karloff, F. Korn, B. Saha and D. Srivastava, "Discovering Conservation Rules," in IEEE Transactions on Knowledge & Data Engineering, vol. 26, no. 6, pp. 1332-1348, 2014.