2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud) (2017)
Prague, Czech Republic
Aug. 21, 2017 to Aug. 23, 2017
The Semantic Data are built from triples, that contain subjects, predicates and objects. On the other hand we can consider the triples as edges. The subject and the object are the nodes and the predicate is the label of the edge. In this view the Semantic Data define a graph. This graph can be very large, because a Semantic Dataset contains millions of triples. To query this dataset we can use the SPARQL query language. Since the Big Data tools appeared the researchers try to evaluate the SPARQL with that tools. In the last few year the distributed graph analytic tools appeared too. So the challenge is: use the graph analytic tools to evaluate the semantic query on the semantic graph. In this paper we present the PSparkql that extends the Sparkql with parallel query plan. The system uses the Spark GraphX distributed graph analytic tool. We show less edges enough for the evaluation than the Sparkql is using. We also collect some statistics (number of predicates, data properties) about the graph to change the evaluation order of the SPARQL query. We compare our results with related works: the Sparkql and the S2X.
Big Data, graph theory, parallel databases, query languages, query processing, semantic Web
G. Gombos and A. Kiss, "P-Spar(k)ql: SPARQL Evaluation Method on Spark GraphX with Parallel Query Plan," 2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud), Prague, Czech Republic, 2017, pp. 212-219.