The Community for Technology Leaders
Parallel and Distributed Processing Symposium, International (2012)
Shanghai, China China
May 21, 2012 to May 25, 2012
ISSN: 1530-2075
ISBN: 978-1-4673-0975-2
pp: 680-690
In this paper we propose a parallel programming model that combines two well-known execution models: Single Instruction, Multiple Data (SIMD) and Single Program, Multiple Data (SPMD). The combined model supports SIMD-style data parallelism in global address space and supports SPMD-style task parallelism in local address space. One of the most important features in the combined model is that data communication is expressed by global data assignments instead of message passing. We implement this combined programming model into Python, making parallel programming with Python both highly productive and performing on distributed memory multi-core systems. We base the SIMD data parallelism on DistNumPy, an auto-parallel zing version of the Numerical Python (NumPy) package that allows sequential NumPy programs to run on distributed memory architectures. We implement the SPMD task parallelism as an extension to DistNumPy that enables each process to have direct access to the local part of a shared array. To harvest the multi-core benefits in modern processors we exploit multi-threading in both SIMD and SPMD execution models. The multi-threading is completely transparent to the user -- it is implemented in the runtime with Open MP and by using multi-threaded libraries when available. We evaluate the implementation of the combined programming model with several scientific computing benchmarks using two representative multi-core distributed memory systems -- an Intel Nehalem cluster with Infini band interconnects and a Cray XE-6 supercomputer -- up to 1536 cores. The benchmarking results demonstrate scalable good performance.
Arrays, Libraries, Data models, Parallel processing, Computational modeling, Parallel programming, Scientific Computing, Parallel Programming, Parallel Computing, Python

Y. Zheng, M. R. Kristensen and B. Vinter, "PGAS for Distributed Numerical Python Targeting Multi-core Clusters," Parallel and Distributed Processing Symposium, International(IPDPS), Shanghai, China China, 2012, pp. 680-690.
203 ms
(Ver 3.3 (11022016))