The Community for Technology Leaders
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2013)
Edinburgh, United Kingdom United Kingdom
Sept. 7, 2013 to Sept. 11, 2013
ISSN: 1089-795X
ISBN: 978-1-4799-1018-2
pp: 363-374
Youngjoon Jo , Sch. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA
Michael Goldfarb , Sch. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA
Milind Kulkarni , Sch. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA
ABSTRACT
Repeated tree traversals are ubiquitous in many domains such as scientific simulation, data mining and graphics. Modern commodity processors support SIMD instructions, and using these instructions to process multiple traversals at once has the potential to provide substantial performance improvements. Unfortunately these algorithms often feature highly diverging traversals which inhibit efficient SIMD utilization, to the point that other, less profitable sources of vectorization must be exploited instead. Previous work has proposed traversal splicing, a locality transformation for tree traversals, which dynamically reorders traversals based on previous behavior, based on the insight that traversals which have behaved similarly so far are likely to behave similarly in the future. In this work, we cast this dynamic reordering as a scheduling for efficient SIMD execution, and show that it can dramatically improve the SIMD utilization of diverging traversals, close to ideal utilization. For five irregular tree traversal algorithms, our techniques are able to deliver speedups of 2.78 on average over baseline implementations. Furthermore our techniques can effectively SIMDize algorithms that prior, manual vectorization attempts could not.
INDEX TERMS
Splicing, Schedules, Photonics, Sorting, Context, Heuristic algorithms, Scheduling
CITATION
Youngjoon Jo, Michael Goldfarb, Milind Kulkarni, , "PS-cache: an energy-efficient cache design for chip multiprocessors", Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, vol. 00, no. , pp. 363-374, 2013, doi:10.1109/PACT.2013.6618832
168 ms
(Ver 3.3 (11022016))