2008 IEEE Fourth International Conference on eScience (2008)
Dec. 7, 2008 to Dec. 12, 2008
Semantic inferencing and querying across large-scale RDF triple stores is notoriously slow. Our objective is to expedite this process by employing Google's MapReduce framework to implement scale-out distributed querying and reasoning. This approach requires RDF graphs to be decomposed into smaller units that are distributed across computational nodes. RDF Molecules appear to offer an ideal approach – providing an intermediate level of granularity between RDF graphs and triples. However, the original RDF molecule definition has inherent limitations that will adversely affect performance. In this paper, we propose a number of extensions to RDF molecules (hierarchy and ordering) to overcome these limitations. We then present some implementation details for our MapReduce-based RDF molecule store. Finally we evaluate the benefits of our approach in the context of the Bio-MANTA project – an application that requires integration and querying across large-scale protein-protein interaction datasets.
RDF, RDF molecules, MapReduce, distributed processing, data integration
Yuan-Fang Li, Andrew Newman, Jane Hunter, "Scalable Semantics ", 2008 IEEE Fourth International Conference on eScience, vol. 00, no. , pp. 111-118, 2008, doi:10.1109/eScience.2008.23