The Community for Technology Leaders
Green Image
Issue No. 06 - June (2014 vol. 26)
ISSN: 1041-4347
pp: 1367-1383
Wenfei Fan , Laboratory for Foundations of Computer Science (LFCS), School of Informatics, University of Edinburgh, Informatics Forum 5.23, Edinburgh, U.K.
Jianzhong Li , Department of Computer Science and EngineeringSchool of Computer Science and Technology, Harbin Institute of Technology, Heilongjiang, China
Nan Tang , Qatar Foundation, Qatar Computing Research Institute (QCRI), Doha, Qatar
Wenyuan Yu qa , Laboratory for Foundations of Computer Science (LFCS), School of Informatics, University of Edinburgh, Informatics Forum 5.23, Edinburgh, U.K.
ABSTRACT
This paper investigates incremental detection of errors in distributed data. Given a distributed database D, a set Σ of conditional functional dependencies (CFDs), the set V of violations of the CFDs in D, and updates ΔD to D, it is to find, with minimum data shipment, changes ΔV to V in response to ΔD. The need for the study is evident since real-life data is often dirty, distributed and frequently updated. It is often prohibitively expensive to recompute the entire set of violations when D is updated. We show that the incremental detection problem is NP-complete for database D that is partitioned either vertically or horizontally, even when Σ and D are fixed. Nevertheless, we show that it is bounded: there exist algorithms to detect errors such that their computational cost and data shipment are both linear in the size of ΔD and ΔV, independent of the size of the database D. We provide such incremental algorithms for vertically partitioned data and horizontally partitioned data, and show that the algorithms are optimal. We further propose optimization techniques for the incremental algorithm over vertical partitions to reduce data shipment. We verify experimentally, using real-life data on Amazon Elastic Compute Cloud (EC2), that our algorithms substantially outperform their batch counterparts.
INDEX TERMS
optimisation, computational complexity, distributed algorithms, distributed databases
CITATION

W. Fan, J. Li, N. Tang and W. Y. qa, "Incremental Detection of Inconsistencies in Distributed Data," in IEEE Transactions on Knowledge & Data Engineering, vol. 26, no. 6, pp. 1367-1383, 2014.
doi:10.1109/TKDE.2012.138
1073 ms
(Ver 3.3 (11022016))