|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007)
Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors
Chicago, Illinois, USA
December 01-December 05
ISBN: 0-7695-3047-8
| ASCII Text | x | ||
| Karin Strauss, Xiaowei Shen, Josep Torrellas, "Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors," 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 327-342, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), 2007. | |||
| BibTex | x | ||
| @article{ 10.1109/MICRO.2007.37, author = {Karin Strauss and Xiaowei Shen and Josep Torrellas}, title = {Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors}, journal ={2012 45th Annual IEEE/ACM International Symposium on Microarchitecture}, volume = {0}, year = {2007}, issn = {1072-4451}, pages = {327-342}, doi = {http://doi.ieeecomputersociety.org/10.1109/MICRO.2007.37}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture TI - Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors SN - 1072-4451 SP327 EP342 A1 - Karin Strauss, A1 - Xiaowei Shen, A1 - Josep Torrellas, PY - 2007 VL - 0 JA - 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MICRO.2007.37
Snoopy cache coherence can be implemented in any physical network topology by embedding a logical unidirectional ring in the network. Control messages are forwarded using the ring, while other messages can use any path. While the resulting coherence protocols are inexpensive to implement, they enable many ways of overlapping multiple transactions that access the same line -- mak- ing it hard to reason about correctness. Moreover, snoop requests are required to traverse the ring, therefore lengthening coherence transaction latencies. In this paper, we address these problems and make two main contributions. First, we introduce the Ordering invariant, which ensures the correct serialization of colliding transactions in embedded-ring protocols. Second, based on this invariant, we re- move the requirement that snoop requests traverse the ring. In- stead, they are delivered using any network path, as long as snoop responses -- which are typically off the critical path -- use the logi- cal ring. This approach substantially reduces coherence transaction latency. We call the resulting protocol Uncorq. We show that, on a 64-node Chip Multiprocessor (CMP), Un- corq improves the performance, on average, by 23% for SPLASH-2 applications and by 10% for commercial applications. With an ad- ditional simple prefetching optimization, the performance improve- ment is, on average, 26% for SPLASH-2 applications and 18% for commercial applications.
Citation:
Karin Strauss, Xiaowei Shen, Josep Torrellas, "Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors," micro, pp.327-342, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), 2007
Usage of this product signifies your acceptance of the Terms of Use.
