The Community for Technology Leaders
RSS Icon
Issue No.05 - May (2012 vol.61)
pp: 593-606
A. Ros , Dept. de Informdtica de Sist. y Comput., Univ. Politec. de Valencia, València, Spain
One cost-effective way to meet the increasing demand for larger high-performance shared-memory servers is to build clusters with off-the-shelf processors connected with low-latency point-to-point interconnections like HyperTransport. Unfortunately, HyperTransport addressing limitations prevent building systems with more than eight nodes. While the recent High-Node Count HyperTransport specification overcomes this limitation, recently launched twelve-core Magny-Cours processors have already inherited it and provide only 3 bits to encode the pointers used by the directory cache which they include to increase the scalability of their coherence protocol. In this work, we propose and develop an external device to extend the coherence domain of Magny-Cours processors beyond the 8-node limit while maintaining the advantages provided by the directory cache. Evaluation results for systems with up to 32 nodes show that the performance offered by our solution scales with the number of nodes, enhancing the directory cache effectiveness by filtering additional messages. Particularly, we reduce execution time by 47 percent in a 32-die system with respect to the 8-die Magny-Cours configuration.
shared memory systems, cache storage, Internet, microprocessor chips, protocols, Internet, magny-cours cache coherence, arger high-performance shared-memory servers, off-the-shelf processors, low-latency point-to-point interconnections, hypertransport, twelve-core magny-cours processors, coherence protocol, directory cache, traffic filtering, Coherence, Probes, Program processors, Protocols, Servers, Scalability, Proposals, traffic filtering., High-performance computing, shared memory, cache coherence, directory protocol, coherence extension, scalability
A. Ros, "Extending Magny-Cours Cache Coherence", IEEE Transactions on Computers, vol.61, no. 5, pp. 593-606, May 2012, doi:10.1109/TC.2011.65
[1] J.M. Owen, M.D. Hummel, D.R. Meyer, and J.B. Keller, “System and Method of Maintaining Coherency in a Distributed Communication System,” U.S. Patent 7069361, June 2006.
[2] Intel, “An Introduction to the Intel QuickPath Interconnect,” Whitepaper, introduction.pdf, Jan. 2009.
[3] InfiniBand Architecture specification release 1.2, InfiniBand Trade Assoc., http:/, Oct. 2004.
[4] P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes, “Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor,” IEEE Micro, vol. 30, no. 2, pp. 16-29, Apr. 2010.
[5] P. Conway, “Computer System with Integrated Directory and Processor Cache,” U.S. Patent 6868485, Mar. 2005.
[6] SGI, “Technical Advances in the SGI Altix UV Architecture,” Whitepaper,, 2009.
[7] 3Leaf Systems “Next Generation Hybrid Systems for HPC,” Whitepaper, paper_Next_Gen_Hybrid_Systems_for_HPC.pdf , 2009.
[8] J. Duato, F. Silla, S. Yalamanchili, B. Holden, P. Miranda, J. Underhill, M. Cavalli, and U. Brüning, “Extending Hyper Transport Protocol for Improved Scalability,” Proc. First Int'l Workshop Hyper Transport Research and Applications (WHTRA), pp. 46-53, Feb. 2009.
[9] A. Ros, B. Cuesta, R. Fernández-Pascual, M.E. Gómez, M.E. Acacio, A. Robles, J.M. García, and J. Duato, “EMC2: Extending Magny-Cours Coherence for Large-Scale Servers,” Proc. 17th Int'l Conf. High Performance Computing (HiPC), pp. 1-11, Dec. 2010.
[10] R. Kota and R. Oehler, “Horus: Large-Scale Symmetric Multiprocessing for Opteron Systems,” IEEE Micro, vol. 25, no. 2, pp. 30-40, Mar. 2005.
[11] J. Laudon and D. Lenoski, “The SGI Origin: A cc-NUMA Highly Scalable Server,” Proc. 24th Int'l Symp. Computer Architecture (ISCA), pp. 241-251, June 1997.
[12] J. Brooks, C. Grassl, and S. Scott, “Performance of the CRAY T3E Multiprocessor,” Proc. ACM/IEEE Conf. Supercomputing (SC '97), pp. 1-17, Nov. 1997.
[13] A. Agarwal, R. Simoni, J.L. Hennessy, and M.A. Horowitz, “An Evaluation of Directory Schemes for Cache Coherence,” Proc. 15th Int'l Symp. Computer Architecture (ISCA), pp. 280-289, May 1988.
[14] P. Sweazey and A.J. Smith, “A Class of Compatible Cache Consistency Protocols and Their Support by the IEEE Futurebus,” Proc. 13th Int'l Symp. Computer Architecture (ISCA), pp. 414-423, June 1986.
[15] P.S. Magnusson, M. Christensson, and J. Eskilson et al., “Simics: A Full System Simulation Platform,” IEEE Computer, vol. 35, no. 2, pp. 50-58, Feb. 2002.
[16] M.M. Martin, D.J. Sorin, and B.M. Beckmann et al., “Multifacet's General Execution-Driven Multiprocessor Simulator (GEMS) Toolset,” Computer Architecture News, vol. 33, no. 4, pp. 92-99, Sept. 2005.
[17] N. Agarwal, T. Krishna, L.-S. Peh, and N.K. Jha, “GARNET: A Detailed On-Chip Network Model Inside a Full-System Simulator,” Proc. IEEE Int'l Symp. Performance Analysis of Systems and Software (ISPASS), pp. 33-42, Apr. 2009.
[18] S. Thoziyoor, N. Muralimanohar, J.H. Ahn, and N.P. Jouppi, “Cacti 5.1,” HP Labs, Technical Report HPL-2008-20, Apr. 2008.
[19] S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta, “The SPLASH-2 Programs: Characterization and Methodological Considerations,” Proc. 22nd Int'l Symp. Computer Architecture (ISCA), pp. 24-36, June 1995.
[20] A.R. Alameldeen and D.A. Wood, “Variability in Architectural Simulations of Multi-Threaded Workloads,” Proc. Nine Int'l Symp. High-Performance Computer Architecture (HPCA), pp. 7-18, Feb. 2003.
12 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool