This Article 
 Bibliographic References 
 Add to: 
A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors
January 2005 (vol. 16 no. 1)
pp. 67-79
Jos? Gonz?lez, IEEE Computer Society
Jos? Duato, IEEE

Abstract—One important issue the designer of a scalable shared-memory multiprocessor must deal with is the amount of extra memory required to store the directory information. It is desirable that the directory memory overhead be kept as low as possible, and that it scales very slowly with the size of the machine. Unfortunately, current directory architectures provide scalability at the expense of performance. This work presents a scalable directory architecture that significantly reduces the size of the directory for large-scale configurations of a multiprocessor without degrading performance. First, we propose multilayer clustering as an effective approach to reduce the width of directory entries. Based on this concept, we derive three new compressed sharing codes, some of them with a space complexity of {\rm{O}}\left(\log_2\left(\log_{2}\left(N\right)\right)\right) for an N-node system. Then, we present a novel two-level directory architecture to eliminate the penalty caused by compressed directories in general. The proposed organization consists of a small full-map first-level directory (which provides precise information for the most recently referenced lines) and a compressed second-level directory (which provides in-excess information for all the lines). The proposals are evaluated based on extensive execution-driven simulations (using RSIM) of a 64-node cc-NUMA multiprocessor. Results demonstrate that a system with a two-level directory architecture achieves the same performance as a multiprocessor with a big and nonscalable full-map directory, with a very significant reduction of the memory overhead.

[1] J. Goodman, “Using Cache Memories to Reduce Processor-Memory Traffic,” Proc. Int'l Symp. Computer Architecture (ISCA '83), June 1983.
[2] J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks: An Engineering Approach. Morgan Kaufmann Publishers, Inc., 2002.
[3] L. Censier and P. Feautrier, “A New Solution to Coherence Problems in Multicache Systems,” IEEE Trans. Computers, vol. 27, no. 12, pp. 1112-1118, Dec. 1978.
[4] S.S. Mukherjee and M.D. Hill, “An Evaluation of Directory Protocols for Medium-Scale Shared-Memory Multiprocessors,” Proc. Eighth Int'l Conf. Supercomputing (ICS '94), pp. 64-74, July 1994.
[5] D.E. Culler, J.P. Singh, and A. Gupta, Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann Publishers, Inc., 1999.
[6] J. Laudon and D. Lenoski, “The SGI Origin: A ccNUMA Highly Scalable Server,” Proc. 24th Int'l Symp. Computer Architecture (ISCA '97), pp. 241-251, June 1997.
[7] M.A. Heinrich, “The Performance and Scalability of Distributed Shared Memory Cache Coherence Protocols,” PhD thesis, Stanford Univ., 1998.
[8] A. Gupta, W.-D. Weber, and T. Mowry, “Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes,” Proc. Int'l Conf. Parallel Processing (ICPP '90), pp. 312-321, Aug. 1990.
[9] B. O'Krafka and A. Newton, “An Empirical Evaluation of Two Memory-Efficient Directory Methods,” Proc. 17th Int'l Symp. Computer Architecture (ISCA '90), pp. 138-147, May 1990.
[10] M.E. Acacio, J. González, J.M. García, and J. Duato, “A New Scalable Directory Architecture for Large-Scale Multiprocessors,” Proc. Seventh Int'l Symp. High Performance Computer Architecture (HPCA-7), pp. 97-106, Jan. 2001.
[11] D. Gustavson, “The Scalable Coherent Interface and Related Standards Projects,” IEEE Micro, vol. 12, no. 1, pp. 10-22, Jan./Feb. 1992.
[12] Y. Chang and L. Bhuyan, “An Efficient Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors,” IEEE Trans. Computers, vol. 48, no. 3, pp. 352-360, Mar. 1999.
[13] H. Nilsson and P. Stenström, “The Scalable Tree Protocol— A Cache Coherence Approach for Large-Scale Multiprocessors,” Proc. Fourth Int'l Symp. Parallel and Distributed Processing (SPDP '92), pp. 498-506, Dec. 1992.
[14] T. Lovett and R. Clapp, “Sting: A cc-NUMA Computer System for the Commercial Marketplace,” Proc. 23rd Int'l Symp. Computer Architecture (ISCA '96), pp. 308-317, 1996.
[15] Convex Computer Corp., Convex Exemplar Architecture, dhw-014 ed., Nov. 1993.
[16] L.A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese, “Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing,” Proc. 27th Int'l Symp. Computer Architecture (ISCA '00), pp. 282-293, June 2000.
[17] K. Gharachorloo, M. Sharma, S. Steely, and S.V. Doren, “Architecture and Design of Alphaserver GS320,” Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS IX), pp. 13-24, Nov. 2000.
[18] T. Hosomi, Y. Kanoh, M. Nakamura, and T. Hirose, “A DSM Architecture for a Parallel Computer CENJU-4,” Proc. Sixth Int'l Symp. High Performance Computer Architecture (HPCA-6), pp. 287-298, Jan. 2000.
[19] A. Gupta and W.-D. Weber, “Cache Invalidation Patterns in Shared-Memory Multiprocessors,” IEEE Trans. Computers, vol. 41, no. 7, pp. 794-810, July 1992.
[20] A. Agarwal, R. Simoni, J. Hennessy, and M. Horowitz, “An Evaluation of Directory Schemes for Cache Coherence,” Proc. 15th Int'l Symp. Computer Architecture (ISCA '88), pp. 280-289, May 1988.
[21] D. Chaiken, J. Kubiatowicz, and A. Agarwal, “Limitless Directories: A Scalable Cache Coherence Scheme,” Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS IV), pp. 224-234, Apr. 1991.
[22] R. Simoni and M. Horowitz, “Dynamic Pointer Allocation for Scalable Cache Coherence Directories,” Proc. Int'l Symp. Shared Memory Multiprocessing, pp. 72-81, Apr. 1991.
[23] A. Agarwal, R. Bianchini, D. Chaiken, K.L. Johnson, D. Kranz, J. Kubiatowicz, B.-H. Lim, K. Mackenzie, and D. Yeung, “The MIT Alewife Machine: Architecture and Performance,” Proc. 22nd Int'l Symp. Computer Architecture (ISCA '95), pp. 2-13, May/June 1995.
[24] J.H. Choi and K.H. Park, “Segment Directory Enhancing the Limited Directory Cache Coherence Schemes,” Proc. 13th Int'l Parallel and Distributed Processing Symp. (IPDPS '99), pp. 258-267, Apr. 1999.
[25] R. Simoni, “Cache Coherence Directories for Scalable Multiprocessors,” PhD thesis, Stanford Univ., 1992.
[26] A.K. Nanda, A.-T. Nguyen, M.M. Michael, and D.J. Joseph, “High-Throughput Coherence Control and Hardware Messaging in Everest,” IBM J. Research and Development, vol. 45, no. 2, pp. 229-244, Mar. 2001.
[27] M.M. Michael and A.K. Nanda, “Design and Performance of Directory Caches for Scalable Shared Memory Multiprocessors,” Proc. Fifth Int'l Symp. High Performance Computer Architecture (HPCA-5), pp. 142-151, Jan. 1999.
[28] C.J. Hughes, V.S. Pai, P. Ranganathan, and S.V. Adve, “RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors,” Computer, vol. 35, no. 2, pp. 40-49, Feb. 2002.
[29] M.D. Hill, “Multiprocessors Should Support Simple Memory-Consistency Models,” Computer, vol. 31, no. 8, pp. 28-34, Aug. 1998.
[30] S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta, “The Splash-2 Programs: Characterization and Methodological Considerations,” Proc. 22nd Int'l Symp. Computer Architecture (ISCA '95), pp. 24-36, June 1995.
[31] D.E. Culler, A. Dusseau, S.C. Goldstein, A. Krishnamurthy, S. Lumetta, S. Luna, T. von Eicken, and K. Yelick, “Parallel Programming in Split-C,” Proc. Int'l SC1993 High Performance Networking and Computing Conf., pp. 262-273, Nov. 1993.
[32] J. Singh, W.-D. Weber, and A. Gupta, “Splash: Stanford Parallel Applications for Shared-Memory,” Computer Architecture News, vol. 20, no. 1, pp. 5-44, Mar. 1992.
[33] S.S. Mukherjee, S.D. Sharma, M.D. Hill, J.R. Larus, A. Rogers, and J. Saltz, “Efficient Support for Irregular Applications on Distributed-Memory Machines,” Proc. Fifth Int'l Symp. Principles & Practice of Parallel Programming (PPOPP '95), pp. 68-79, July 1995.
[34] A. Nowatzyk, G. Aybay, M. Browne, E. Kelly, M. Parkin, W. Radke, and S. Vishin, “The S3.MP Scalable Shared Memory Multiprocessor,” Proc. Int'l Conf. Parallel Processing (ICPP '95), pp. 1-10, July 1995.

Index Terms:
Scalability, directory memory overhead, two-level directory architecture, compressed sharing codes, unnecessary coherence messages, cc-NUMA multiprocessor.
Manuel E. Acacio, Jos? Gonz?lez, Jos? M. Garc?, Jos? Duato, "A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors," IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 1, pp. 67-79, Jan. 2005, doi:10.1109/TPDS.2005.4
Usage of this product signifies your acceptance of the Terms of Use.