Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2007)
Sept. 15, 2007 to Sept. 19, 2007
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PACT.2007.41
Jaewook Shin , Argonne National Laboratory, USA
Single instruction multiple data (SIMD) functional units are ubiquitous in modern microprocessors. Effective use of these SIMD functional units is essential in achieving the highest possible performance. Automatic generation of SIMD instructions in the presence of control flow is chal- lenging, however, not only because SIMD code is hard to generate in the presence of arbitrarily complex control flow, but also because the SIMD code executing the instructions in all control paths may slow compared to the scalar orig- inal, which may bypass a large portion of the code. One promising technique introduced recently involves inserting branches-on-superword-condition-codes (BOSCCs) to by- pass vector instructions. In this paper, we describe two techniques that improve on the previous approach. First, BOSCCs are generated in a nested fashion so that even BOSCCs themselves can be bypassed by other BOSCCs. Second, we generate all vec_any_* instructions to by- pass even some predicate-defining instructions. We imple- mented these techniques in a vectorizing compiler. On 14 kernels, the compiler achieves distinct speedups, including 1.99X over the previous technique that generates single- level BOSCCs and vec_any_ne only.
Jaewook Shin, "Introducing Control Flow into Vectorized Code", Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, vol. 00, no. , pp. 280-291, 2007, doi:10.1109/PACT.2007.41