The Community for Technology Leaders
RSS Icon
Issue No.08 - August (2012 vol.45)
pp: 42-52
Benedict R. Gaster , Advanced Micro Devices
Lee Howes , Advanced Micro Devices
Heterogeneous parallel primitives (HPP) addresses two major shortcomings in current GPGPU programming models: it supports full composability by defining abstractions and increases flexibility in execution by introducing braided parallelism.
Graphics processing unit, Programming, Performance evaluation, Parellel processing, Indexes, Hardware, Multithreading, hardware, massively threaded computing systems, heterogeneous parallel primitives, braided parallelism, persistent threading, GPGPU programming, data-parallel execution, distributed arrays
Benedict R. Gaster, Lee Howes, "Can GPGPU Programming Be Liberated from the Data-Parallel Bottleneck?", Computer, vol.45, no. 8, pp. 42-52, August 2012, doi:10.1109/MC.2012.257
1. Aaftab Munshi ed., The OpenCL Specification, v 1.2, rev. 15, Khronos Group, 2011; .
2. Nvidia, NVIDIA CUDA C Programming Guide, v 4.2, 2012; DevZone/docs/html/C/docCUDA_C_Programming_Guide.pdf .
3. D. Moth, “Taming GPU Compute with C++ AMP,” video, Channel 9, 2011; TOOL-802T.
4. K. Yelick et al., “Titanium: A High-Performance Java Dialect,” Concurrency and Computation: Practice and Experience, Sept.-Nov., 1998, pp. 825-836.
5. M. Pharr and W.R. Mark, “ispc: A SPMD Compiler for High-Performance CPU Programming,” to appear in Proc. 2012 Conf. Innovative Parallel Computing (InPar 12), ACM, 2012; ispc_inpar_2012.pdf.
6. E.A. West and A.S. Grimshaw, “Braid: Integrating Task and Data Parallelism,” Proc. 5th Symp. Frontiers of Massively Parallel Processing (Frontiers 95), IEEE CS, 1995, pp. 211-219.
7. K. Gupta, J.A. Stuart, and J.D. Owens, “A Study of Persistent Threads Style GPU Programming for GPGPU Workloads,” to appear in Proc. 2012 Conf. Innovative Parallel Computing (InPar 12), ACM, 2012; .
8. T. Aila and S. Laine, “Understanding the Efficiency of Ray Traversal on GPUs,” Proc. Conf. High-Performance Graphics (HPG 09), ACM, 2009, pp. 145-149.
9. S. Xiao and W. Feng, “Inter-Block GPU Communication via Fast Barrier Synchronization,” Proc. IEEE Int'l Symp. Parallel and Distributed Processing (IPDPS 10), IEEE, 2010; .
10. D.G. Merrill and A.S. Grimshaw, “Revisiting Sorting for GPGPU Stream Architectures,” Proc. 19th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT 10), ACM, 2010, pp. 545-546.
11. P. Charles et al., “X10: An Object-Oriented Approach to Non-Uniform Cluster Computing,” Proc. 20th Ann. ACM SIGPLAN Conf. Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 05), ACM, 2005, pp. 519-538.
12. T. von Eicken et al., “Active Messages: A Mechanism for Integrated Communication and Computation,” Proc. 19th Ann. Int'l Symp. Computer Architectures (ISCA 92), ACM, 1992, pp. 256-266.
13. J. Reinders, Intel Threading Building Blocks, O'Reilly, 2007.
14. M. Mantor and M. Houston, “AMD Graphic Core Next: Low Power High Performance Graphics & Parallel Compute,” presentation, High-Performance Graphics Conf. (Hot3D), AMD, 2011; 2620_final.pdf.
15. J. Sugerman et al., “GRAMPS: A Programming Model for Graphics Pipelines,” ACM Trans. Graphics, Jan. 2009, article no. 4.
16. N. Rubin and B.R. Gaster, “An Overview of HSAIL,” presentation, AMD Fusion Developer Summit, 2012.
57 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool