Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2013)
Edinburgh, United Kingdom United Kingdom
Sept. 7, 2013 to Sept. 11, 2013
Arnamoy Bhattacharyya , Department of Computing Science, University of Alberta, Edmonton, Canada
Figure 1 shows the performance of three parallel versions (auto-SIMDized, auto-SIMDized+auto-OpenMP by bgxlc r and auto-SIMDized+auto-OpenMP+speculatively parallelized by an automatic speculative parallelization framework developed) of the SPEC2006 and PolyBench/C benchmarks. The speculative loops in lbm have 98% coverage that accounts for the speedup while in bzip2(35%) and dynprog (26%), the poor coverage of speculative loops introduces overhead. h264ref has the highest number of loops speculatively parallelized (47) but most of them have function calls that introduce dependences, thus causing slowdown (only 12% of speculative threads successfully committed). Filtering speculative execution of loops with non-side-effect-free function calls tackles the mispeculation overhead. cholesky and dynprog experience L1 cache misses due to LR mode(12% and 10% respectively) while jacobi and seidel experience huge dynamic path length increase (112% and 123% respectively over sequential).
Arnamoy Bhattacharyya, "Do inputs matter? using data-dependence profiling to evaluate thread level speculation in BG/Q", Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, vol. 00, no. , pp. 401, 2013, doi:10.1109/PACT.2013.6618836