The Community for Technology Leaders
2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT) (2012)
Minneapolis, MN, USA
Sept. 19, 2012 to Sept. 23, 2012
ISBN: 978-1-5090-6609-4
pp: 345-354
Abdullah Gharaibeh , Department of Electrical and Computer Engineering, The University of British Columbia, Canada
Lauro Beltrao Costa , Department of Electrical and Computer Engineering, The University of British Columbia, Canada
Elizeu Santos-Neto , Department of Electrical and Computer Engineering, The University of British Columbia, Canada
Matei Ripeanu , Department of Electrical and Computer Engineering, The University of British Columbia, Canada
ABSTRACT
Large, real-world graphs are famously difficult to process efficiently. Not only they have a large memory footprint but most graph processing algorithms entail memory access patterns with poor locality, data-dependent parallelism, and a low compute-to-memory access ratio. Additionally, most real-world graphs have a low diameter and a highly heterogeneous node degree distribution. Partitioning these graphs and simultaneously achieve access locality and load-balancing is difficult if not impossible. This paper demonstrates the feasibility of graph processing on heterogeneous (i.e., including both CPUs and GPUs) platforms as a cost-effective approach towards addressing the graph processing challenges above. To this end, this work (i) presents and evaluates a performance model that estimates the achievable performance on heterogeneous platforms; (ii) introduces TOTEM - a processing engine based on the Bulk Synchronous Parallel (BSP) model that offers a convenient environment to simplify the implementation of graph algorithms on heterogeneous platforms; and, (iii) demonstrates TOTEM'S efficiency by implementing and evaluating two graph algorithms (PageRank and breadth-first search). TOTEM achieves speedups close to the model's prediction, and applies a number of optimizations that enable linear speedups with respect to the share of the graph offloaded for processing to accelerators.
INDEX TERMS
Algorithm design and analysis, Graphics processing units, Engines, Parallel processing, Partitioning algorithms, Computer architecture
CITATION
Abdullah Gharaibeh, Lauro Beltrao Costa, Elizeu Santos-Neto, Matei Ripeanu, "A yoke of oxen and a thousand chickens for heavy lifting graph processing", 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT), vol. 00, no. , pp. 345-354, 2012, doi:
96 ms
(Ver 3.3 (11022016))