2015 IEEE International Conference on Data Science and Data Intensive Systems (DSDIS) (2015)
Dec. 11, 2015 to Dec. 13, 2015
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/DSDIS.2015.76
Distributed computations on graphs gained importance with the emergence of large graphs, e.g., in the web or social networks. Frameworks like Hadoop, Giraph and Spark are used for their processing. Yet, they require advanced programming techniques to minimize skew and data shuffling. Declarative, query-like, but at the same time efficient solutions like Pig for general purpose analytics are lacking. In this paper we promote the use of declarative datalog with aggregation for large graph processing. We presents an implementation which extends tApache Spark with the capability of executing datalog queries. This approach makes it possible to express graph algorithms in a well studied declarative query language and execute them on an existing and mature infrastructure for distributed computation. At the same time the data processed with datalog queries is fully integrated with the caching mechanism of Spark and can be part of a larger iterative algorithm.
Sparks, Optimization, Clustering algorithms, Programming, Iterative methods, Semantics, Conferences
J. Sroka, M. Rogala, M. Adamczyk and J. Hidders, "A Datalog Engine for Iterative Graph Algorithms on Large Clusters," 2015 IEEE International Conference on Data Science and Data Intensive Systems (DSDIS), Sydney, Australia, 2015, pp. 113-114.