In this paper, we introduce novel compiler optimization techniques to reduce the number of operations performed in critical sections that occur in explicitly-parallel programs. Specifically, we focus on three code transformations: 1) Partial Strength Reduction (PSR) of critical sections to replace critical sections by non-critical sections on certain control flow paths; 2) Critical Load Elimination (CLE) to replace memory accesses within a critical section by accesses to scalar temporaries that contain values loaded outside the critical section; and 3) Non-critical Code Motion (NCM) to hoist thread-local computations out of critical sections. The effectiveness of the first two transformations is further increased by interprocedural analysis.
The effectiveness of our techniques has been demonstrated for critical section constructs from three different explicitly-parallel programming models --- the isolated construct in Habanero Java (HJ), the synchronized construct in standard Java, and transactions in the Java-based Deuce software transactional memory system. We used two SMP platforms (a 16-core Intel Xeon SMP and a 32-Core IBM Power7 SMP) to evaluate our optimizations on 17 explicitly-parallel benchmark programs that span all three models. Our results show that the optimizations introduced in this paper can deliver measurable performance improvements that increase in magnitude when the program is run with a larger number of processor cores. These results underscore the importance of optimizing critical sections, and the fact that the benefits from such optimizations will continue to increase with increasing numbers of cores in future many-core processors.