This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Parallel Programming Models for Heterogeneous Multicore Architectures
September/October 2010 (vol. 30 no. 5)
pp. 42-53
Roger Ferrer, Barcelona Supercomputing Center
Pieter Bellens, Barcelona Supercomputing Center
Vicenc Beltran, Barcelona Supercomputing Center
Marc Gonzalez, Barcelona Supercomputing Center
Xavier Martorell, Barcelona Supercomputing Center
Rosa M. Badia, Barcelona Supercomputing Center
Eduard Ayguade, Barcelona Supercomputing Center
Jae-Seung Yeom, Barcelona Supercomputing Center
Scott Schneider, Barcelona Supercomputing Center
Konstantinos Koukos, Barcelona Supercomputing Center
Michail Alvanos, Barcelona Supercomputing Center
Dimitros S. Nikolopoulos, Barcelona Supercomputing Center
Angelos Bilas, Barcelona Supercomputing Center

This article evaluates the scalability and productivity of six parallel programming models for heterogeneous architectures, and finds that task-based models using code and data annotations require the minimum programming effort while sustaining nearly best performance. However, achieving this result requires both extensions of programming models to control locality and granularity and proper interoperability with platform-specific optimizations.

1. M.D. Hill and M.R. Marty, "Amdahl's Law in the Multicore Era," Computer, vol. 41, no. 7, July 2008, pp. 33-38.
2. K. O'Brien et al., "Supporting OpenMP on Cell," Int'l J. Parallel Programming, vol. 36, no. 3, 2008, pp. 289-311.
3. K. Fatahalian et al., "Sequoia: Programming the Memory Hierarchy," Proc. 2006 Conf. High-Performance Networking and Computing (SC 06), IEEE CS Press, 2006, pp. 83-92.
4. P. Cooper et al., "Offload: Automating Code Migration to Heterogeneous Multicore Systems," Proc. High-Performance Embedded Architectures and Compilers (HiPEAC 10), LNCS 5952, Springer, 2010, pp. 307-321.
5. J.M. Perez et al., "CellSs: Making It Easier to Program the Cell Broadband Engine Processor," IBM J. Research and Development, vol. 51, no. 5, Sept. 2007, pp. 593-604.
6. S. Schneider et al., "A Comparison of Programming Models for Multiprocessors with Explicitly Managed Memory Hierarchies," Proc. 14th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP 09), ACM Press, 2009, pp. 131-140.
7. J. Nieplocha et al., "High Performance Remote Memory Access Communication: The Armci Approach," Int'l J. High Performance Computing Applications, vol. 20, no. 2, 2006, pp. 233-253.
8. J.M. Perez, R.M. Badia, and J. Labarta, "A Dependency-aware Task-based Programming Environment for Multi-core Architectures," Proc. IEEE Int'l Conf. Cluster Computing, IEEE CS Press, 2008, pp. 142-151.
9. G. Tzenakis et al., "Tagged Procedure Calls (TPC): Efficient Runtime Support for Task-Based Parallelism on the Cell Processor," Proc. High-Performance Embedded Architectures and Compilers (HiPEAC 10), LNCS 5952, Springer, 2010, pp. 307-321.
10. R. Ferrer et al., "Analysis of Task Offloading for Accelerators," Proc. High-Performance Embedded Architectures and Compilers (HiPEAC 10), LNCS 5952, Springer, 2010, pp. 322-336.
11. V. Beltran et al., "CellMT: A Cooperative Multithreading Library for the Cell BE," Proc. 16th Ann. IEEE Int'l Conf. High Performance Computing (HiPC 09), IEEE CS Press, 2009, pp. 245-253.
12. V. Beltran et al., Cooperative Multithreading on the Cell BE, tech. report, Computer Architecture Dept., Technical Univ. of Catalonia, 2009.
13. K. Kennedy and J.R. Allen, Optimizing Compilers for Modern Architectures: A Dependence-based Approach, Morgan Kaufmann Publishers, 2002.
14. S.S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann Publishers, 1997.
15. J. Xue, Loop Tiling for Parallelism, Kluwer Academic Publishers, 2000.
16. E. Ayguade et al., "A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures," Proc. Evolving OpenMP in an Age of Extreme Parallelism (IWOMP 09), vol. 5568, Springer, 2009, pp. 154-167.
17. R. Ferrer et al., "Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL," Proc. 23rd Int'l Workshop Languages and Compilers for Parallel Computing (LCPC 10), Springer-Verlag, 2010.
1. T. Saidani et al., "Parallelization Schemes for Memory Optimization on the Cell Processor: A Case Study of Image Processing Algorithm," Proc. Workshop on Memory Performance (Medea 07), ACM Press, 2007, pp. 9-16.
2. P. Cooper et al., "Offload: Automating Code Migration to Heterogeneous Multicore Systems," Proc. High-Performance Embedded Architectures and Compilers (HiPEAC 10), LNCS 5952, Springer, 2010, pp. 307-321.
3. R. Dolbeau, S. Bihan, and F. Bodin, "HMPP: A Hybrid Multi-core Parallel Programming Environment," Proc. Workshop on General Processing Using GPUs, 2006.
4. S.-Z. Ueng et al., "CUDA-lite: Reducing GPU Programming Complexity," Proc. 21st Ann. Workshop Languages and Compilers for Parallel Computing (LCPC 08), Springer, 2008, pp. 1-15.
5. J.A. Stratton, S.S. Stone, and W.W. Hwu, "MCUDA: An Efficient Implementation of CUDA Kernels for Multi-Core CPUs," Proc. 21st Ann. Workshop Languages and Compilers for Parallel Computing (LCPC 08), Springer, 2008, pp. 16-30.
1. B. Rose, "Intra- and Inter-chip Communication Support for Asymmetric Multicore Processors with Explicitly Managed Memory Hierarchies," master's thesis, Dept. of Computer Science, Virginia Polytechnic Inst. and State Univ., 2008.
2. W. Hundsdorfer, Numerical Solution of Advection-Diffusion-Reaction Equations, tech. report, Centrum voor Wiskunde en Informatica, 1996.
3. J.C. Linford and A. Sandu, "Optimizing Large Scale Chemical Transport Models for Multicore Platforms," Proc. 2008 Spring Simulation Multiconf., Soc. for Modeling and Simulation Int'l, 2008, pp. 369-376.
4. A. Sandu et al., "Adjoint Sensitivity Analysis of Regional Air Quality Models," J. Computational Physics, vol. 204, no. 1, 2005, pp. 222-252.
5. W.P.L. Carter, "Documentation of the SAPRC-99 Chemical Mechanism for VOC Reactivity Assessment," final report contract no. 92-329, Calif. Air Resources Board, 8 May 2000.
6. X. Feng, K.W. Cameron, and D.A. Buell, "PBPI: A High Performance Implementation of Bayesian Phylogenetic Inference," Proc. Conf. Supercomputing (SC 06), ACM Press, 2006, article no. 75.

Index Terms:
concurrent programming, environments for multiprocessor systems, hardware/software interfaces, heterogeneous (hybrid) systems
Citation:
Roger Ferrer, Pieter Bellens, Vicenc Beltran, Marc Gonzalez , Xavier Martorell, Rosa M. Badia, Eduard Ayguade, Jae-Seung Yeom, Scott Schneider, Konstantinos Koukos, Michail Alvanos, Dimitros S. Nikolopoulos, Angelos Bilas, "Parallel Programming Models for Heterogeneous Multicore Architectures," IEEE Micro, vol. 30, no. 5, pp. 42-53, Sept.-Oct. 2010, doi:10.1109/MM.2010.94
Usage of this product signifies your acceptance of the Terms of Use.