2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (2012)
Shanghai, China China
May 21, 2012 to May 25, 2012
Graph algorithms are notorious for not getting good speedup on parallel architectures. These algorithms tend to suffer from irregular dependencies and a high synchronization cost that prevent an efficient execution on distributed memory machines. Hence such algorithms are mostly parallelized on shared memory machines. However, current commodity shared memory machines do not typically offer enough parallelism to process these problems. In this paper, we are presenting an early investigation of the scalability of such algorithms on Intel's upcoming Many Integrated Core (Intel MIC) architecture which, when it will be released in 2012, is expected to provide more than 50 physical cores with SMT capability. The Intel MIC architecture can be programmed through many programming models, here we investigate the three most popular of these models namely OpenMP, Cilk Plus and Intel's TBB. We present scalability results of a parallel graph coloring algorithm, three variations of a breadth-first search algorithm and a micro benchmark for irregular computations using these three programming models. Our results on a prototype board show that the multi-threaded architecture of Intel MIC can be effectively used for hiding latencies in irregular applications to achieve almost perfect speedup.
Microwave integrated circuits, Programming, Instruction sets, Computer architecture, Computational modeling, Kernel, Image color analysis, breadth-first search, Graph algorithm, unstructured irregular computation, scalability, multi-threaded architectures, graph coloring
E. Saule and U. V. Catalyurek, "An Early Evaluation of the Scalability of Graph Algorithms on the Intel MIC Architecture," 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum(IPDPSW), Shanghai, China China, 2012, pp. 1629-1639.