The Community for Technology Leaders
Cluster Computing and the Grid, IEEE International Symposium on (2013)
Delft, Netherlands Netherlands
May 13, 2013 to May 16, 2013
ISBN: 978-1-4673-6465-2
pp: 277-284
ABSTRACT
Fast processing for extremely large-scale graph is becoming increasingly important in various domains such as health care, social networks, intelligence, system biology, and electric power grids. The GIM-V algorithm based on MapReduce programing model is designed as a general graph processing method for supporting petabyte-scale graph data. On the other hand, recent large-scale data-intensive computing systems tend to employ GPU accelerators to gain good peak performance and high memory bandwidth, however, the validity of acceleration, including optimization techniques, of the GIM-V algorithm using GPUs is an open problem. To address the problem, we implemented a multi-GPU-based GIM-V application with load balance optimization between GPU devices. Our implementation extends the existing MapReduce library for supporting multi-GPU-environments using the MPI library and optimizes load balance between GPU devices by employing task scheduling-based graph partitioning. We conducted our implementation on the TSUBAME2.0 supercomputer using 256 nodes (6144 hyper-threaded CPU cores, 768 GPUs). The results exhibit that our GPU-based implementation performed 87.04 ME/s on 2^30 (1.07 billion) vertices and 2^34 (17.2 billion) edges, and 1.52 times faster than the CPU-based naive implementation with 2^29 vertices and 2^33 edges. We also studied the performance characteristics of our implementation and load balance optimization technique.
INDEX TERMS
MapReduce, Large-scale Graph Processing, GPGPU
CITATION

K. Shirahata, H. Sato, T. Suzumura and S. Matsuoka, "A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for Large-Scale Heterogeneous Supercomputers," 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)(CCGRID), Delft, 2013, pp. 277-284.
doi:10.1109/CCGrid.2013.85
82 ms
(Ver 3.3 (11022016))