The Community for Technology Leaders
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (2004)
Antibes Juan-les-Pins, France
Sept. 29, 2004 to Oct. 3, 2004
ISSN: 1089-795X
ISBN: 0-7695-2229-7
pp: 177-188
Ram Rangan , Princeton University
Neil Vachharajani , Princeton University
Manish Vachharajani , Princeton University
David I. August , Princeton University
Despite the success of instruction-level parallelism (ILP) optimizations in increasing the performance of microprocessors, certain codes remain elusive. In particular, codes containing recursive data structure (RDS) traversal loops have been largely immune to ILP optimizations, due to the fundamental serialization and variable latency of the loop-carried dependence through a pointer-chasing load. To address these and other situations, we introduce decoupled software pipelining (DSWP), a technique that statically splits a single-threaded sequential loop into multiple non-speculative threads, each of which performs useful computation essential for overall program correctness. The resulting threads execute on thread-parallel architectures such as simultaneous multithreaded (SMT) cores or chip multiprocessors (CMP), expose additional instruction level parallelism, and tolerate latency better than the original single-threaded RDS loop. To reduce overhead, these threads communicate using a synchronization array, a dedicated hardware structure for pipelined inter-thread communication. DSWP used in conjunction with the synchronization array achieves an 11% to 76% speedup in the optimized functions on both statically and dynamically scheduled processors.
Ram Rangan, Neil Vachharajani, Manish Vachharajani, David I. August, "Decoupled Software Pipelining with the Synchronization Array", Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, vol. 00, no. , pp. 177-188, 2004, doi:10.1109/PACT.2004.10003
92 ms
(Ver 3.3 (11022016))