The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.11 - Nov. (2012 vol.61)
pp: 1624-1637
Jeongseob Ahn , KAIST, Daejeon
Daehoon Kim , KAIST, Daejeon
Jaehong Kim , KAIST, Daejeon
Jaehyuk Huh , KAIST, Daejeon
ABSTRACT
Although snoop-based coherence protocols provide fast cache-to-cache transfers with a simple and robust coherence mechanism, scaling the protocols has been difficult due to the overheads of broadcast snooping. In this paper, we propose a coherence filtering technique called subspace snooping, which stores the potential sharers of each memory page in the page table entry. By using the sharer information in the page table entry, coherence transactions for a page generate snoop requests only to the subset of nodes in the system. However, the coherence subspace of a page may evolve, as the phases of applications may change or the operating system may migrate threads to different nodes. To adjust subspaces dynamically, subspace snooping supports two different shrinking mechanisms, which remove obsolete nodes from subspaces. Among the two shrinking mechanisms, subspace snooping with safe shrinking can be integrated to any type of coherence protocols and network topologies, as it guarantees that a subspace always contains the precise sharers of a page. Speculative shrinking breaks the subspace superset property, but achieves better snoop reductions than safe shrinking. We evaluate subspace snooping with Token Coherence on unordered mesh networks. Subspace snooping reduces 58 percent of snoops on average for a set of parallel scientific and server workloads, and 87 percent for our multiprogrammed workloads.
INDEX TERMS
Coherence, Protocols, Power demand, Bandwidth, Instruction sets, System-on-a-chip, Operating systems, Multicore/single-chip multiprocessors, Coherence, Protocols, Power demand, Bandwidth, Instruction sets, System-on-a-chip, Operating systems, cache coherence, Coherence, Protocols, Power demand, Bandwidth, Instruction sets, System-on-a-chip, Operating systems, low-power design
CITATION
Jeongseob Ahn, Daehoon Kim, Jaehong Kim, Jaehyuk Huh, "Subspace Snooping: Exploiting Temporal Sharing Stability for Snoop Reduction", IEEE Transactions on Computers, vol.61, no. 11, pp. 1624-1637, Nov. 2012, doi:10.1109/TC.2011.195
REFERENCES
[1] N. Agarwal, L.-S. Peh, and N.K. Jha, “In-Network Coherence Filtering: Snoopy Coherence without Broadcasts,” Proc. 42nd Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO), pp. 232-243, 2009.
[2] N. Agarwal, L.-S. Peh, and N.K. Jha, “In-Network Snoop Ordering (INSO) : Snoopy Coherence on Unordered Interconnects,” Proc. 15th Int'l Symp. High Performance Computer Architecture (HPCA), pp. 67-78, Feb. 2009.
[3] M. Awasthi, K. Sudan, R. Balasubramonian, and J. Carter, “Dynamic Hardware-Assisted Software-Controlled Page Placement to Manage Capacity Allocation and Sharing within Large Caches,” Proc. 15th Int'l Symp. High Performance Computer Architecture (HPCA), pp. 250-261, Feb. 2009.
[4] C. Bienia, S. Kumar, J.P. Singh, and K. Li, “The PARSEC Benchmark Suite: Characterization and Architectural Implications,” Proc. 17th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), pp. 72-81, 2008.
[5] E.E. Bilir, R.M. Dickson, Y. Hu, M. Plakal, D.J. Sorin, M.D. Hill, and D.A. Wood, “Multicast Snooping: A New Coherence Method Using a Multicast Address Network,” Proc. 26th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 294-304, 1999.
[6] B.H. Bloom, “Space/Time Trade-Offs in Hash Coding with Allowable Errors,” Comm. ACM, vol. 13, no. 7, pp. 422-426, 1970.
[7] J.F. Cantin, M.H. Lipasti, and J.E. Smith, “Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking,” Proc. 32nd Ann. Int'l Symp. Computer Architecture (ISCA), pp. 246-257, 2005.
[8] N. Eisley, L.-S. Peh, and L. Shang, “In-Network Cache Coherence,” Proc. 39th Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO), pp. 321-332, 2006.
[9] M. Ekman, P. Stenström, and F. Dahlgren, “TLB and Snoop Energy-Reduction Using Virtual Caches in Low-Power Chip-Multiprocessors,” Proc. Int'l Symp. Low Power Electronics and Design (ISLPED), pp. 243-246, 2002.
[10] N.D. Enright Jerger, L.-S. Peh, and M.H. Lipasti, “Virtual Tree Coherence: Leveraging Regions and In-Network Multicast Trees for Scalable Cache Coherence,” Proc. 41st IEEE/ACM Int'l Symp. Microarchitecture (MICRO), pp. 35-46, 2008.
[11] C. Fensch and M. Cintra, “An OS-Based Alternative to Full Hardware Coherence on Tiled CMPs,” Proc. 14th Int'l Conf. High Performance Computer Architecture (HPCA), pp. 355-366, Feb. 2008.
[12] N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, “Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches,” Proc. 36th Ann. Int'l Symp. Computer Architecture (ISCA), pp. 184-195, 2009.
[13] D. Kim, J. Ahn, J. Kim, and J. Huh, “Subspace Snooping: Filtering Snoops with Operating System Support,” Proc. 19th Int'l Conf. Parallel Architectures and Compilation Techniques (PACT), pp. 111-122, 2010.
[14] D. Kim, H. Kim, and J. Huh, “Virtual Snooping: Filtering Snoops in Virtualized Multi-Cores,” Proc. 43rd Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO), pp. 459-470, 2010.
[15] P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner, “Simics: A Full System Simulation Platform,” Computer, vol. 35, no. 2, pp. 50-58, Feb. 2002.
[16] M.M.K. Martin, P.J. Harper, D.J. Sorin, M.D. Hill, and D.A. Wood, “Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared Memory Multiprocessors,” Proc. 30th Int'l Symp. Computer Architecture (ISCA), pp. 206-217, 2003.
[17] M.M.K. Martin, M.D. Hill, and D.A. Wood, “Token Coherence: Decoupling Performance and Correctness,” Proc. 30th Int'l Symp. Computer Architecture (ISCA), pp. 182-193, June 2003.
[18] M.M.K. Martin, D.J. Sorin, B.M. Beckmann, M.R. Marty, M. Xu, A.R. Alameldeen, K.E. Moore, M.D. Hill, and D.A. Wood, “Multifacet's General Execution-Driven Multiprocessor Simulator GEMS Toolset,” SIGARCH Computer Architecture News, vol. 33, no. 4, pp. 92-99, 2005.
[19] M.M.K. Martin, D.J. Sorin, M.D. Hill, and D.A. Wood, “Bandwidth Adaptive Snooping,” Proc. Eighth Int'l Symp. High Performance Computer Architecture (HPCA), pp. 251-262, Feb. 2002.
[20] A. Moshovos, “RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence,” Proc. 32nd Int'l Symp. Computer Architecture (ISCA), pp. 234-245, 2005.
[21] A. Moshovos, G. Memik, B. Falsafi, and A.N. Choudhary, “JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers,” Proc. Seventh Int'l Symp. High Performance Computer Architecture (HPCA), pp. 85-96, 2001.
[22] U. Nawathe, M. Hassan, K. Yen, A. Kumar, A. Ramachandran, and D. Greenhill, “Implementation of an 8-Core, 64-Thread, Power-Efficient SPARC Server on a Chip,” IEEE J. Solid-State Circuits, vol. 43, no. 1, pp. 6-20, Jan. 2008.
[23] L.-S. Peh, N. Agarwal, N. Jha, and T. Krishna, “GARNET: A Detailed On-Chip Network Model inside a Full-System Simulator,” Proc. Int'l Symp. Performance Analysis of Systems and Software (ISPASS), pp. 33-42, Apr. 2009.
[24] A. Raghavan, C. Blundell, and M.M.K. Martin, “Token Tenure: PATCHing Token Counting Using Directory-Based Cache Coherence,” Proc. 41st IEEE/ACM Int'l Symp. Microarchitecture (MICRO), pp. 47-58, 2008.
[25] K. Strauss, X. Shen, and J. Torrellas, “Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors,” Proc. 33rd Ann. Int'l Symp. Computer Architecture (ISCA), pp. 327-338, 2006.
[26] D. Tarjan, S. Thoziyoor, and N. Jouppi, “CACTI 4.0,” technical report, HP Labs, 2006.
[27] P.J. Teller, “Translation-Lookaside Buffer Consistency,” Computer, vol. 23, no. 6, pp. 26-36, June 1990.
[28] S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, P. Iyer, A. Singh, T. Jacob, S. Jain, S. Venkataraman, Y. Hoskote, and N. Borkar, “An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS,” Proc. IEEE Int'l Solid Solid-State Circuits Conf., pp. 98-589, Feb. 2007.
[29] S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta, “The SPLASH-2 Programs: Characterization and Methodological Considerations,” Proc. 22th Int'l Symp. Computer Architecture (ISCA), pp. 24-36, 1995.
[30] J. Zebchuk and A. Moshovos, “RegionTracker: A Case for Dual-Grain Tracking in the Memory System,” technical report, Computer Group, Univ. of Toronto, 2006.
24 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool