HSA Connections - Home
Proof-of-Concept C++17 Parallel STL Offloading for GCC/libstdc++
MAY 10, 2018 19:47 PM
A+ A A-

Proof-of-Concept C++17 Parallel STL Offloading for GCC/libstdc++

Introduction:
 
Parmance and General Processor Technologies have been collaborating on C++17 Parallel STL offloading support based on HSA (Heterogeneous System Architecture) and GCC (GNU Compiler Collection). A working proof-of-concept has been now released and made available in https://github.com/parmance/par_offload. This post is a high level overview of the project.
 
Heterogeneous Offloading and C++17
 
The C++17 standard released in December 2017 adds execution policies in its standard template library (STL) algorithm definition. Execution policies enable the programmer to declare that the algorithm library call, along with any user-defined functionality the call uses, is safe to execute in parallel. The user-defined functionality is referred to as "element access functions" (EAF) by the standard.
 
The PSTL (Parallel Standard Template Library) of C++17 focuses on forward progress guarantees and their implications to parallelization safety on homogeneous processors. 
 
However, there is no "parallel heterogeneous offloading execution policy" yet in the C++ standard; there seems to be an implicit assumption that the parallel execution will occur in the same processor where it was invoked. To make the offloading decisions explicit, for our offloading implementation, we defined a new execution policy type 'parallel_offload_policy' (par_offload) which the programmer can use to declare "heterogeneous offload" or "multiple-ISA" safety for the involved user-defined functions. 
 
A call to the 'transform' PSTL function with this policy looks like the following:
 
std::transform(std::execution::experimental::par_offload,
               pixel_data.begin(), pixel_data.end(),
               pixel_data.begin(),
    [](char c) -> char {
      return c * 16;
    });
 
In this case, a lambda function was used to iterate over all the elements in the pixel_data of std::vector type with the processing offloaded to a heterogeneous device, if one is available.
 
Shared Virtual Memory
 
Explicit data management is problematic in the case of offloading general purpose C/C++ programs that assume a unified address space and allow passing pointers to functions without attached size information. Indeed, a single unified coherent address space across all the processors in a heterogeneous platform would remove a major obstacle in heterogeneous platforms and make programming such devices much simpler. 
 
Heterogeneous System Architecture (HSA) (1.0 published in March 2015) is a language neutral standard targeting heterogeneous systems. It defines a cache-coherent shared global virtual memory as a core feature. That is, an HSAF heterogeneous platform supports data sharing across devices (called agents) as easily as in "homogeneous" C/C++ multithreaded programming.
 
In the GCC PSTL offloading work we used the HSA Runtime as a heterogeneous platform middleware and rely on the coherent system memory capabilities of the HSA Full Profile. HSA is interesting for this use case most importantly due to its shared heterogeneous memory requirement that is expected to work seamlessly with C/C++ memory model. Also there is a wide selection of open source components implementing the different parts of the specs available. For example, its intermediate language HSAIL has both front end and backend support already in upstream GCC. There are also implementations of its runtime API to enable development and testing via offloading to CPU based targets.
 
Implementation Status and Future Plans
 
We now have a proof-of-concept offloading implementation of several PSTL algorithms running with multiple ways to define the user-specified functionality working. The implementation supports lambda functors (with and without captures), C functions, std::functions containing C functions, function objects, and user defined data types. 
 
Next we plan to properly integrate the prototype to libstdc++ and GCC, implement the rest of the algorithms and finally optimize the performance.
 
Links and references
 
FIRST
PREV
NEXT
LAST
Page(s):
[%= name %]
[%= createDate %]
[%= comment %]
Share this:
Please login to enter a comment:
RESET

Computing Now Blogs
Business Intelligence
by Drew Hendricks
by Keith Peterson
Cloud Computing
A Cloud Blog: by Irena Bojanova
The Clear Cloud: by STC Cloud Computing
Careers
Computing Careers: by Lori Cameron
Display Technologies
Enterprise Solutions
Enterprise Thinking: by Josh Greenbaum
Healthcare Technologies
The Doctor Is In: Dr. Keith W. Vrbicky
Heterogeneous Systems
Hot Topics
NealNotes: by Neal Leavitt
Industry Trends
Internet Of Things
Sensing IoT: by Irena Bojanova
Software
Software Technologies: by Christof Ebert