This Article 
 Bibliographic References 
 Add to: 
ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols
September 2001 (vol. 50 no. 9)
pp. 921-934

Abstract—Directories have been used to maintain cache coherency in shared memory multiprocessors with private caches. The traditional full map directory tracks the exact caching status for each shared memory block and is designed to be efficient and simple. Unfortunately, the inherent directory size explosion makes it unsuitable for large-scale multiprocessors. In this paper, we propose a new directory scheme, dubbed associative full map directory ($ADir_pNB$) which reduces the directory storage requirement. The proposed $ADir_pNB$ uses one directory entry to maintain the sharing information for a set of exclusively cached memory blocks in a centralized linked list style. By implementing dynamic cache pointer allocation, reclamation, and replacement hints, $ADir_pNB$ can be implemented as “a full map directory with lower directory memory cost.” Our analysis indicates that, on a typical architectural paradigm, $ADir_pNB$ reduces memory overhead of a traditional full map directory by up to 70-80 percent. In addition to the low memory overhead, we show that the proposed scheme can be implemented with appropriate protocol modification and hardware addition. Simulation studies indicate that $ADir_pNB$ can achieve a competitive performance with the $Dir_pNB$. Compared with limited directory schemes, $ADir_pNB$ shows more stable and robust performance results on applications across a spectrum of memory sharing and access patterns due to the elimination of directory overflows. We believe that $ADir_pNB$ can be employed as a design alternative of full map directory for moderately large-scale and fine-grain shared memory multiprocessors.

[1] A. Agarwal, R. Simoni, J. Hennessy, and M. Horowitz, “An Evaluation of Directory Schemes for Cache Coherence,” Proc. 15th Ann. Int'l Symp. Computer Architecture, pp. 280-289, 1988.
[2] A. Agarwal et al., “The MIT Alewife Machine: Architecture and Performance,” Proc. Int'l Symp. Computer Architecture, pp. 2-13, June 1995.
[3] J. Archibald and J.L. Baer, "Cache Coherence Protocols: Evaluation Using a Multiprocessor Simulation Model," ACM Trans. Computer Systems, vol. 4, no. 4, Nov. 1986.
[4] L. Barroso, K. Gharachorloo, and E. Bugnion, "Memory System Characterization of Commercial Workloads," Proc. 25th Int'l Symp. Computer Architecture, June 1998, pp. 3-14.
[5] T. Brewer and G. Astfalk, “The Evolution of the HP/Convex Exemplar,” Proc. COMPCON Spring '97: 42nd IEEE CS Int'l Conf., pp. 81-86, Feb. 1997.
[6] L.M. Censier and P. Feautrier, “A New Solution to Coherence Problem in Multicache Systems,” IEEE Trans. Computers, vol. 27, no. 12, pp. 1112-1118, Dec. 1978.
[7] D. Chaiken, J. Kubiatowicz, and A. Agarwal,“LimitLESS directories: A scalable cache coherence scheme,”inProc. Int. Conf. Architect. Support Programm. Languages Oper. Syst., 1991, pp. 224–234.
[8] D. Chaiken and A. Agarwal, "Software-Extended Coherent Shared Memory—Performance and Cost," Twenty-First Annual Int'l Symp. Computer Arch., (ISCA 21), ACM, April 1994.
[9] Y. Chang and L.N. Bhuyan, “An Efficient Tree Cache Coherence Protocol for Distributed Shared Memory Multiprocessors,” IEEE Trans. Computers, vol. 48, no. 3, pp. 352-360, Mar. 1999.
[10] J. Chapin, S.A. Herrod, M. Rosenblum, and A. Gupta, “Memory System Performance of UNIX on CC-NUMA Multiprocessors,” Proc. ACM Sigmetrics Conf. Measurement and Modeling of Computer Systems, pp. 1-13, May 1995.
[11] D. Culler, J.P. Singh, and A. Gupta, Parallel Computer Architecture: A Hardware/Software Approach, Morgan Kaufmann, San Francisco, 1998.
[12] F. Dahlgren, M. Dubois, and P. Stenström, "Combined Performance Gains of Simple Cache Protocol Extensions," Proc. 21st Int'l Symp. Computer Architecture, pp.187-197, 1994,.
[13] Data General Corp., “Aviion AV 20000 Server Technical Overview,” Data General White Paper, 1997.
[14] M. Dubois, C. Scheurich, and F.A. Briggs, “Synchronization, Coherence, and Event Ordering in Multiprocessors,” Computer, vol. 21, no. 2, pp. 9-21, Feb. 1998.
[15] J.R. Goodman, "Using Cache Memory to Reduce Processor-Memory Traffic," Proc. 10th Ann. Symp. Computer Architecture, pp. 124-132, 1983.
[16] A. Gupta, W.-D. Weber, and T. Mowry, “Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Scheme,” Proc. Int'l Conf. Parallel Processing, pp. 312-321, 1990.
[17] M. Heinrich et al., A Quantitatitve Analysis of the Performance and Scalability of Distributed Shared Memory Cache Coherence Protocols IEEE Trans. Computers, vol. 48, no. 2, pp. 205-217, (special issue on cache memory and related problems), Feb. 1999.
[18] S. Herrod, M. Rosenblum, E. Bugnion, S. Devine, R. Bosch, J. Chapin, K. Govil, D. Teodosiu, E. Witchel, and B. Verghese, “The SimOS User Guide,” http://simos.stanford.eduuserguide/ 1998.
[19] S.A. Herrod, Using Complete Machine Simulation to Understand Computer System Behavior, doctoral dissertation, Stanford Univ., 1998.
[20] M.D. Hill, J.R. Larus, S.K. Reinhardt, and D.A. Wood, “Cooperative Shared Memory: Software and Hardware for Scalable Multiprocessors,” Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-V), pp. 262-273, 1992.
[21] C. Ho, H. Ziegler, and M. Dubois, “In Memory Directories: Eliminating the Cost of Directories in CC-NUMAs,” Proc. 10th Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 222-230, 1998.
[22] IEEE Standard for Scalable Coherent Interface (SCI) pp. 1596-1992, 1993.
[23] D.V. James, A.T. Laundrie, S. Gjessing, and G.S. Sohi, “Distributed-Directory Scheme: Scalable Coherent Interface,” Computer, vol. 23, no. 6, pp. 74-77, June 1990.
[24] R.E. Johnson, “Extending the Scalable Coherent Interface for Large-Scale Shared-Memory Multiprocessors,” PhD thesis, Univ. of Wisconsin-Madison, 1993.
[25] N.P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully Associative Cache and Prefetch Buffers,” Proc. 17th Int'l Symp. Computer Architecture, pp. 364-373, May 1990.
[26] R.H. Katz et al., "Implementing a Cache Consistency Protocol," Proc. 12th Ann. Int'l Symp. Computer Architecture, June 1985, pp. 158-166.
[27] S. Kaxiras, “Identification and Optimization of Sharing Patterns for Scalable Shared-Memory Multiprocessors,” PhD thesis, Univ. of Wisconsin-Madison, 1998.
[28] M. Heinrich et al. “The Stanford FLASH Multiprocessor,” Proc. 21th Int'l Symp. Computer Architecture, pp. 302-313, April 1994.
[29] J. Laudon and D. Lenoski, "The SGI Origin: A cc-NUMA Highly Scalable Server," Proc. 24th Ann. Int'l Symp. Computer Architecture, May 1997.
[30] D. Lenoski et al., “The Stanford DASH Multiprocessor,” Computer, pp. 63-79, Mar. 1992.
[31] T. Li and B.W. Rong, “A Versatile Directory Scheme (${\rm Dir_2NB+L}$) and Its Implementation on BY91-1 Multiprocessors System,” Proc. IEEE Advances on Parallel and Distributed Computing, pp. 180-185, 1997.
[32] T. Li, L.K. John, N. Vijaykrishnan, A. Sivasubramaniam, A. Murthy, and J. Sabarinathan, “Using Complete System Simulation to Characterize SPECjvm98 Benchmarks,” Proc. Int'l Conf. Supercomputing, pp. 22-33, 2000.
[33] D. J. Lilja and P. Yew,“Combining hardware and software cache coherence strategies,”inProc. Int. Conf. Supercomput., 1991, pp. 274–283.
[34] D. J. Lilja,“Cache coherence in large-scale shared memory multiprocessors: Issues and comparisons,”ACM Comput. Surv., vol. 25, no. 3, pp. 303–338, Sept. 1993.
[35] T. Lovett and R. Clapp, “STiNG: A CC-NUMA Computer System for the Commercial Marketplace,” Proc. 23rd Ann. Int'l Symp. Computer Architecture, pp. 308-317, May 1996.
[36] M.M. Michael, A.K. Nanda, B.-H. Lim, and M.L. Scott, “Coherence Controller Architectures for SMP-Based CC-NUMA Multiprocessors,” Proc. 24th Ann. Int'l Symp. Computer Architecture, pp. 219-228, 1997.
[37] A.K. Nanda, A.T. Nguyen, M.M. Michael, and D.J. Joseph, “High-Throughput Coherence Controllers,” Proc. Sixth Int'l Symp. High Performance Computer Architecture, pp. 145-155, 2000.
[38] H. Nilsson and P. Stenström, “The Scalable Tree Protocol—A Cache Coherence Approach for Large-Scale Multiprocessors,” Proc. IEEE Symp. Parallel and Distributed Processing, pp. 498-506, 1992.
[39] B.W. O'Krafka and A.R. Newton, “An Empirical Evaluation of Two Memory-Efficient Directory Methods,” Proc. 17th Ann. Int'l Symp. Computer Architecture, pp. 138-147, 1990.
[40] M.S. Papamarcos and J.H. Patel, “A Low Overhead Coherence Solution for Multiprocessors with Private Cache Memories,” Proc. 12th Ann. Int'l Symp. Computer Architecture, pp. 348-354, 1985.
[41] S.K. Reinhardt, J.R. Larus, and D.A. Wood, “Tempest and Typhoon: User-Level Shared Memory,” Proc. 21st Int'l Symp. Computer Architecture, pp. 325-337, Apr. 1994.
[42] M. Rosenblum, S. Herrod, E. Witchel, and A. Gupta, "Complete Computer System Simulation," IEEE Parallel and Distributed Technology, Fall 1995.
[43] R. Simoni and M. Horowitz, “Dynamic Pointer Allocation for Salable Cache Coherence Directories,” Proc. Int'l Symp. Shared Memory Multiprocessing, pp. 72-81, 1991.
[44] R. Simoni, “Cache Coherence Directories for Scalable Multiprocessors,” PhD dissertation, Stanford Univ., Oct. 1992.
[45] V. Soundararajan et al., "Flexible Use of Memory for Replication/Migration in Cache-Coherent DSM Multiprocessors," Proc. 25th Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1998, pp. 342-355.
[46] P. Stenström, "A Survey of Cache Coherence Scheme for Multiprocessors," Computer, vol. 23, no. 6, pp. 12-24, Jun.e 1990.
[47] P. Stenstrom, T. Joe, and A. Gupta, "Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures," Proc. 19th Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1992, pp. 80-91.
[48] M. Thapar, B. Delagi, and M.J. Flynn, “Linked List Cache Coherence for Scalable Shared Memory Multiprocessors,” Proc. Int'l Symp. Parallel Processing, pp. 34-43, 1993.
[49] W.-D. Webber, S. Gold, P. Helland, T. Shimizu, T. Wicki,, and W. Wilcke, “The Mercury Interconnect Architecture: A Cost-Effective Infrastructure for High-Performance Servers,” Proc. Int'l Symp. Computer Architecture (ISCA-24), pp. 98-107, 1997.
[50] S.C. Woo et al., "The SPLASH-2 Programs: Characterization and Methodological Considerations," Proc. 22nd Annual Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., June 1995, pp. 24-36.
[51] D.A. Wood, S. Chandra, B. Falsafi, M.D. Hill, J.R. Larus, A.R. Lebeck, J.C. Lewis, S.S. Mukherjee, S. Palacharla, and S.K. Reinhardt, “Mechanisms for Cooperative Shared Memory,” Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 156-167, 1993.

Index Terms:
Cache coherence, directory protocols, shared memory multiprocessors, computer architecture.
Tao Li, Lizy Kurian John, "ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols," IEEE Transactions on Computers, vol. 50, no. 9, pp. 921-934, Sept. 2001, doi:10.1109/12.954507
Usage of this product signifies your acceptance of the Terms of Use.