The Community for Technology Leaders
Green Image
Issue No. 04 - April (2011 vol. 60)
ISSN: 0018-9340
pp: 472-483
Marcelo Cintra , University of Edinburgh, Edinburgh
Thomas J. Ashby , IMEC, Leuven
Pedro Díaz , University of Edinburgh, Edinburgh
ABSTRACT
Implementing shared memory consistency models on top of hardware caches gives rise to the well-known cache coherence problem. The standard solution involves implementing coherence protocols in hardware, an approach with some design complexity, hardware costs, and restrictions on interconnect behavior. However, for some memory consistency models, an alternative is to enforce coherence in the software implementation of synchronization primitives, using software controlled invalidations and forced writebacks. This requires minimal hardware support but gives less selective enforcement, which affects performance. This paper proposes a novel hybrid software-hardware coherence mechanism. In this scheme, software is responsible for triggering the coherence actions—self-invalidations and writebacks—at appropriate times while hardware uses Bloom filters to perform more selective self-invalidations. We evaluate the proposed scheme on applications from two different domains: the SPLASH-2 scientific and ALP multimedia benchmarks. Experimental results show that while the software-only coherence scheme shows less performance degradation than expected, it still unacceptably degrades performance for some of the benchmarks. Filtering out unnecessary invalidations improves the worst-case performance by as much as 93 percent, and brings the performance of the hybrid scheme within five percent of full hardware coherence for 10 out of 13 benchmarks, on a 32-core CMP with a shared L2 cache.
INDEX TERMS
Multiprocessors, cache coherence.
CITATION
Marcelo Cintra, Thomas J. Ashby, Pedro Díaz, "Software-Based Cache Coherence with Hardware-Assisted Selective Self-Invalidations Using Bloom Filters", IEEE Transactions on Computers, vol. 60, no. , pp. 472-483, April 2011, doi:10.1109/TC.2010.155
104 ms
(Ver 3.1 (10032016))