The Community for Technology Leaders
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2013)
Edinburgh, United Kingdom United Kingdom
Sept. 7, 2013 to Sept. 11, 2013
ISSN: 1089-795X
ISBN: 978-1-4799-1018-2
pp: 353-362
Majedul Haque Sujon , Dept. of Comput. Sci., Univ. of TX at San Antonio, San Antonio, TX, USA
R. Clint Whaley , Sch. of EE & CS, Louisiana State Univ., Baton Rouge, LA, USA
Qing Yi , Dept. of Comput. Sci., Univ. of Colorado, Colorado Springs, CO, USA
ABSTRACT
Modern architectures increasingly rely on SIMD vectorization to improve performance for floating point intensive scientific applications. However, existing compiler optimization techniques for automatic vectorization are inhibited by the presence of unknown control flow surrounding partially vectorizable computations. In this paper, we present a new approach, speculative vectorization, which speculates past dependent branches to aggressively vectorize computational paths that are expected to be taken frequently at runtime, while simply restarting the calculation using scalar instructions when the speculation fails. We have integrated our technique in an iterative optimizing compiler and have employed empirical tuning to select the profitable paths for speculation. When applied to optimize 9 floating-point benchmarks, our optimizing compiler has achieved up to 6.8X speedup for single precision and 3.4X for double precision kernels using AVX, while vectorizing some operations considered not vectorizable by prior techniques.
INDEX TERMS
Vectors, Kernel, Optimization, Algorithm design and analysis, Optimizing compilers, Benchmark testing, Safety,task-based programs, many-core, sampling, simulation
CITATION
Majedul Haque Sujon, R. Clint Whaley, Qing Yi, "Task sampling: computer architecture simulation in the many-core era", Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, vol. 00, no. , pp. 353-362, 2013, doi:10.1109/PACT.2013.6618831
221 ms
(Ver 3.3 (11022016))