|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Manuel E. Acacio, Jos? Gonz?lez, Jos? M. Garc?, Jos? Duato, "A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors," IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 1, pp. 67-79, January, 2005. | |||
| BibTex | x | ||
| @article{ 10.1109/TPDS.2005.4, author = {Manuel E. Acacio and Jos? Gonz?lez and Jos? M. Garc? and Jos? Duato}, title = {A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {16}, number = {1}, issn = {1045-9219}, year = {2005}, pages = {67-79}, doi = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2005.4}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Parallel and Distributed Systems TI - A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors IS - 1 SN - 1045-9219 SP67 EP79 EPD - 67-79 A1 - Manuel E. Acacio, A1 - Jos? Gonz?lez, A1 - Jos? M. Garc?, A1 - Jos? Duato, PY - 2005 KW - Scalability KW - directory memory overhead KW - two-level directory architecture KW - compressed sharing codes KW - unnecessary coherence messages KW - cc-NUMA multiprocessor. VL - 16 JA - IEEE Transactions on Parallel and Distributed Systems ER - | |||
Abstract—One important issue the designer of a scalable shared-memory multiprocessor must deal with is the amount of extra memory required to store the directory information. It is desirable that the directory memory overhead be kept as low as possible, and that it scales very slowly with the size of the machine. Unfortunately, current directory architectures provide scalability at the expense of performance. This work presents a scalable directory architecture that significantly reduces the size of the directory for large-scale configurations of a multiprocessor without degrading performance. First, we propose multilayer clustering as an effective approach to reduce the width of directory entries. Based on this concept, we derive three new compressed sharing codes, some of them with a space complexity of
[1] J. Goodman, “Using Cache Memories to Reduce Processor-Memory Traffic,” Proc. Int'l Symp. Computer Architecture (ISCA '83), June 1983.
[2] J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks: An Engineering Approach. Morgan Kaufmann Publishers, Inc., 2002.
[3] L. Censier and P. Feautrier, “A New Solution to Coherence Problems in Multicache Systems,” IEEE Trans. Computers, vol. 27, no. 12, pp. 1112-1118, Dec. 1978.
[4] S.S. Mukherjee and M.D. Hill, “An Evaluation of Directory Protocols for Medium-Scale Shared-Memory Multiprocessors,” Proc. Eighth Int'l Conf. Supercomputing (ICS '94), pp. 64-74, July 1994.
[5] D.E. Culler, J.P. Singh, and A. Gupta, Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann Publishers, Inc., 1999.
[6] J. Laudon and D. Lenoski, “The SGI Origin: A ccNUMA Highly Scalable Server,” Proc. 24th Int'l Symp. Computer Architecture (ISCA '97), pp. 241-251, June 1997.
[7] M.A. Heinrich, “The Performance and Scalability of Distributed Shared Memory Cache Coherence Protocols,” PhD thesis, Stanford Univ., 1998.
[8] A. Gupta, W.-D. Weber, and T. Mowry, “Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes,” Proc. Int'l Conf. Parallel Processing (ICPP '90), pp. 312-321, Aug. 1990.
[9] B. O'Krafka and A. Newton, “An Empirical Evaluation of Two Memory-Efficient Directory Methods,” Proc. 17th Int'l Symp. Computer Architecture (ISCA '90), pp. 138-147, May 1990.
[10] M.E. Acacio, J. González, J.M. García, and J. Duato, “A New Scalable Directory Architecture for Large-Scale Multiprocessors,” Proc. Seventh Int'l Symp. High Performance Computer Architecture (HPCA-7), pp. 97-106, Jan. 2001.
[11] D. Gustavson, “The Scalable Coherent Interface and Related Standards Projects,” IEEE Micro, vol. 12, no. 1, pp. 10-22, Jan./Feb. 1992.
[12] Y. Chang and L. Bhuyan, “An Efficient Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors,” IEEE Trans. Computers, vol. 48, no. 3, pp. 352-360, Mar. 1999.
[13] H. Nilsson and P. Stenström, “The Scalable Tree Protocol— A Cache Coherence Approach for Large-Scale Multiprocessors,” Proc. Fourth Int'l Symp. Parallel and Distributed Processing (SPDP '92), pp. 498-506, Dec. 1992.
[14] T. Lovett and R. Clapp, “Sting: A cc-NUMA Computer System for the Commercial Marketplace,” Proc. 23rd Int'l Symp. Computer Architecture (ISCA '96), pp. 308-317, 1996.
[15] Convex Computer Corp., Convex Exemplar Architecture, dhw-014 ed., Nov. 1993.
[16] L.A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese, “Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing,” Proc. 27th Int'l Symp. Computer Architecture (ISCA '00), pp. 282-293, June 2000.
[17] K. Gharachorloo, M. Sharma, S. Steely, and S.V. Doren, “Architecture and Design of Alphaserver GS320,” Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS IX), pp. 13-24, Nov. 2000.
[18] T. Hosomi, Y. Kanoh, M. Nakamura, and T. Hirose, “A DSM Architecture for a Parallel Computer CENJU-4,” Proc. Sixth Int'l Symp. High Performance Computer Architecture (HPCA-6), pp. 287-298, Jan. 2000.
[19] A. Gupta and W.-D. Weber, “Cache Invalidation Patterns in Shared-Memory Multiprocessors,” IEEE Trans. Computers, vol. 41, no. 7, pp. 794-810, July 1992.
[20] A. Agarwal, R. Simoni, J. Hennessy, and M. Horowitz, “An Evaluation of Directory Schemes for Cache Coherence,” Proc. 15th Int'l Symp. Computer Architecture (ISCA '88), pp. 280-289, May 1988.
[21] D. Chaiken, J. Kubiatowicz, and A. Agarwal, “Limitless Directories: A Scalable Cache Coherence Scheme,” Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS IV), pp. 224-234, Apr. 1991.
[22] R. Simoni and M. Horowitz, “Dynamic Pointer Allocation for Scalable Cache Coherence Directories,” Proc. Int'l Symp. Shared Memory Multiprocessing, pp. 72-81, Apr. 1991.
[23] A. Agarwal, R. Bianchini, D. Chaiken, K.L. Johnson, D. Kranz, J. Kubiatowicz, B.-H. Lim, K. Mackenzie, and D. Yeung, “The MIT Alewife Machine: Architecture and Performance,” Proc. 22nd Int'l Symp. Computer Architecture (ISCA '95), pp. 2-13, May/June 1995.
[24] J.H. Choi and K.H. Park, “Segment Directory Enhancing the Limited Directory Cache Coherence Schemes,” Proc. 13th Int'l Parallel and Distributed Processing Symp. (IPDPS '99), pp. 258-267, Apr. 1999.
[25] R. Simoni, “Cache Coherence Directories for Scalable Multiprocessors,” PhD thesis, Stanford Univ., 1992.
[26] A.K. Nanda, A.-T. Nguyen, M.M. Michael, and D.J. Joseph, “High-Throughput Coherence Control and Hardware Messaging in Everest,” IBM J. Research and Development, vol. 45, no. 2, pp. 229-244, Mar. 2001.
[27] M.M. Michael and A.K. Nanda, “Design and Performance of Directory Caches for Scalable Shared Memory Multiprocessors,” Proc. Fifth Int'l Symp. High Performance Computer Architecture (HPCA-5), pp. 142-151, Jan. 1999.
[28] C.J. Hughes, V.S. Pai, P. Ranganathan, and S.V. Adve, “RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors,” Computer, vol. 35, no. 2, pp. 40-49, Feb. 2002.
[29] M.D. Hill, “Multiprocessors Should Support Simple Memory-Consistency Models,” Computer, vol. 31, no. 8, pp. 28-34, Aug. 1998.
[30] S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta, “The Splash-2 Programs: Characterization and Methodological Considerations,” Proc. 22nd Int'l Symp. Computer Architecture (ISCA '95), pp. 24-36, June 1995.
[31] D.E. Culler, A. Dusseau, S.C. Goldstein, A. Krishnamurthy, S. Lumetta, S. Luna, T. von Eicken, and K. Yelick, “Parallel Programming in Split-C,” Proc. Int'l SC1993 High Performance Networking and Computing Conf., pp. 262-273, Nov. 1993.
[32] J. Singh, W.-D. Weber, and A. Gupta, “Splash: Stanford Parallel Applications for Shared-Memory,” Computer Architecture News, vol. 20, no. 1, pp. 5-44, Mar. 1992.
[33] S.S. Mukherjee, S.D. Sharma, M.D. Hill, J.R. Larus, A. Rogers, and J. Saltz, “Efficient Support for Irregular Applications on Distributed-Memory Machines,” Proc. Fifth Int'l Symp. Principles & Practice of Parallel Programming (PPOPP '95), pp. 68-79, July 1995.
[34] A. Nowatzyk, G. Aybay, M. Browne, E. Kelly, M. Parkin, W. Radke, and S. Vishin, “The S3.MP Scalable Shared Memory Multiprocessor,” Proc. Int'l Conf. Parallel Processing (ICPP '95), pp. 1-10, July 1995.

