The Community for Technology Leaders
1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425) (1999)
Newport Beach, California
Oct. 12, 1999 to Oct. 16, 1999
ISSN: 1089-795X
ISBN: 0-7695-0425-6
pp: 203
Mahmut Kandemir , Northwestern University
Alok Choudhary , Northwestern University
Prith Banerjee , Northwestern University
J. Ramanujam , Louisiana State University
The performance of applications on large shared-memory multiprocessors with coherent caches depends on the interaction between the granularity of data sharing, the size of the coherence unit and the spatial locality exhibited by the applications, in addition to the amount of parallelism in the applications. Large coherence units are helpful in exploiting spatial locality, but worsen the effects of false sharing. We present a mathematical framework that allows a clean description of the relationship between spatial locality and false sharing. We first show how to identify a severe form of multiple-writer false sharing and then demonstrate the importance of the interaction between optimization techniques aimed at enhancing locality and the techniques oriented toward reducing false sharing. Given the conflicting requirements, a compiler based approach to this problem holds promise. We investigate the use of data transformations in addressing spatial locality and false sharing, and derives an approach that balances the impact of the two. Experimental results demonstrate that such a balanced approach outperforms those approaches that consider only one of these two issues. On an eight-processor SGI Origin 2000 system, our approach brings an additional 9% improvement over a powerful locality optimization technique that uses both loop and data transformations. Also, our approach obtains an additional 19% improvement over an optimization technique that is oriented specifically toward reducing false sharing. Our study also reveals that in addition to reducing synchronization costs and improving memory subsystem performance, obtaining large granularity parallelism also helps these two optimization techniques, namely, enhancing locality and reducing false sharing, be compatible.
data reuse, cache locality, false sharing, loop and memory layout transformations, shared-memory multiprocessors

J. Ramanujam, M. Kandemir, A. Choudhary and P. Banerjee, "On Reducing False Sharing While Improving Locality on Shared Memory Multiprocessors," 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425)(PACT), Newport Beach, California, 1999, pp. 203.
88 ms
(Ver 3.3 (11022016))