Issue No. 08 - Aug. (2017 vol. 29)
Marcin Wylot , Open Distributed Systems, TU Berlin/Fraunhofer FOKUS, Berlin, Germany
Philippe Cudre-Mauroux , eXascale Infolab, University of Fribourg, Fribourg, Switzerland
Manfred Hauswirth , Open Distributed Systems, TU Berlin/Fraunhofer FOKUS, Berlin, Germany
Paul Groth , Elsevier Labs, Amsterdam, NX, The Netherlands
The proliferation of heterogeneous Linked Data on the Web poses new challenges to database systems. In particular, the capacity to store, track, and query provenance data is becoming a pivotal feature of modern triplestores. We present methods extending a native RDF store to efficiently handle the storage, tracking, and querying of provenance in RDF data. We describe a reliable and understandable specification of the way results were derived from the data and how particular pieces of data were combined to answer a query. Subsequently, we present techniques to tailor queries with provenance data. We empirically evaluate the presented methods and show that the overhead of storing and tracking provenance is acceptable. Finally, we show that tailoring a query with provenance information can also significantly improve the performance of query execution.
Resource description framework, W3C, Triples (Data structure), Query processing,RDF, linked data, triplestores, BigData, provenance
Marcin Wylot, Philippe Cudre-Mauroux, Manfred Hauswirth, Paul Groth, "Storing, Tracking, and Querying Provenance in Linked Data", IEEE Transactions on Knowledge & Data Engineering, vol. 29, no. , pp. 1751-1764, Aug. 2017, doi:10.1109/TKDE.2017.2690299