2008 IEEE Fourth International Conference on eScience (2008)
Dec. 7, 2008 to Dec. 12, 2008
Semantic inferencing and querying across large-scale RDF triple stores is notoriously slow. Our objective is to expedite this process by employing Google's MapReduce framework to implement scale-out distributed querying and reasoning. This approach requires RDF graphs to be decomposed into smaller units that are distributed across computational nodes. RDF Molecules appear to offer an ideal approach – providing an intermediate level of granularity between RDF graphs and triples. However, the original RDF molecule definition has inherent limitations that will adversely affect performance. In this paper, we propose a number of extensions to RDF molecules (hierarchy and ordering) to overcome these limitations. We then present some implementation details for our MapReduce-based RDF molecule store. Finally we evaluate the benefits of our approach in the context of the Bio-MANTA project – an application that requires integration and querying across large-scale protein-protein interaction datasets.
RDF, RDF molecules, MapReduce, distributed processing, data integration
Y. Li, A. Newman and J. Hunter, "Scalable Semantics ," 2008 IEEE Fourth International Conference on eScience(ESCIENCE), Indianapolis, IN, 2008, pp. 111-118.