This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Stack Evaluation of Arbitrary Set-Associative Multiprocessor Caches
September 1995 (vol. 6 no. 9)
pp. 930-942

Abstract—We propose a simple solution to the problem of efficient stack evaluation of LRU multiprocessor cache memories with arbitrary set-associative mapping. It is an extension of the existing stack evaluation techniques for all set-associative LRU uniprocessor caches. Special marker entries are used in the stack to represent data blocks (or lines) deleted by an invalidation-based cache coherence protocol. A method of marker-splitting is employed when a data block below a marker in the stack is accessed. Using this technique, one-pass trace evaluation of memory access trace yields hit ratios for all cache sizes and set-associative mappings of multiprocessor caches in a single pass over a memory reference trace. Simulation experiments on some multiprocessor trace data show an order-of-magnitude speed-up in simulation time using this one-pass technique.

[1] R. Mattson,J. Gecsei,D. Slutz,, and I. Traiger,“Hierarchical storage evaluation techniques,” IBM Systems J., vol. 17, no. 2, pp. 78-117, Feb. 1970.
[2] A.J. Smith, "Cache Memories," ACM Computing Surveys, Vol. 14, 1982, pp. 473-540.
[3] J. Gecsei,“Determining hit ratio for multilevel hierarchies,” IBM J. Research and Development, vol. 18, no. 4, pp. 316-327, July 1974.
[4] J.G. Thompson and A.J. Smith, "Efficient (Stack) Algorithms for Analysis of Write-Back and Sector Memories," ACM Trans. Computer Systems, vol. 7, pp. 78-117, Feb. 1989.
[5] M. Hill and A. Smith, "Evaluating Associativity in CPU Caches," IEEE Trans. Computers, vol. 38, no. 12, pp. 1,612-1,630, Dec. 1989.
[6] W.-H. Wang and J.-L. Baer, "Efficient Trace-Driven Simulation Methods for Cache Performance Analysis," ACM Trans. Computer Systems, Aug. 1991, pp. 222-241.
[7] D. Chaiken et al., “Directory-Based Cache Coherence in Large Scale Multiprocessors,” Computer, vol. 23, no. 6, pp. 49-58, June 1990.
[8] E.J. Koldinger,S.J. Eggers,, and H.M. Levy,“On the validity of trace-driven simulation for multiprocessors,” Proc. 18th Int’l Symp. Computer Architecture, pp. 244-253, May 1991.
[9] A.J. Smith, private communication., Apr. 1993.
[10] M. Dubois and F. Briggs,“Effects of cache coherency in multiprocessors,” IEEE Trans. Computers, vol. 31, no. 11, pp. 1,083-1,099, Nov. 1982.
[11] M. Dubois, C. Scheurich, and F.A. Briggs, “Synchronization, Coherence, and Event Ordering in Multiprocessors,” Computer, vol. 21, no. 2, pp. 9-21, Feb. 1998.
[12] P. Stenström, "A Survey of Cache Coherence Scheme for Multiprocessors," Computer, vol. 23, no. 6, pp. 12-24, Jun.e 1990.
[13] E. McCreight,“The dragon computer system: An early overview,” Tech. Report, Xerox Corporation, Sept. 1984.
[14] L.M. Censier, and P. Feautrier,“A new solution to coherence problems in multicache systems,” IEEE Trans. Computers, vol. 27, no. 12, pp. 1,112-1,118, Dec. 1978.
[15] M. Paramarcos and J. Patel,“A low-overhead coherence solution for multiprocessors with private cache memories,” Proc. 11th Int’l Symp. Computer Architecture, pp. 348-354, June 1984.
[16] J. Archibald and J.L. Baer, "Cache Coherence Protocols: Evaluation Using a Multiprocessor Simulation Model," ACM Trans. Computer Systems, vol. 4, no. 4, Nov. 1986.
[17] H. Cheong and A.V. Veidenbaum, “A Cache Coherence Scheme with Fast Selective Invalidation,” Proc. 15th Int'l Symp. Computer Architecture, pp. 299-307, Honolulu, Hawaii, May-June 1988.
[18] K. Li and P. Hudak, "Memory Coherence in Shared Virtual Memory Systems," ACM Trans. Computer Surveys, vol. 7, no. 4, Nov. 1989.
[19] C.E. Wu,Y. Hsu,, and Y.-H. Liu,“Stack simulation for set-associative V/R-type caches,” IBM Report RC# 17393, Nov. 1991. Also in the Proc. of IEEE COMPSAC’92, pp. 332-339, Aug. 1992.
[20] C.E. Wu,Y. Hsu,, and Y.-H. Liu,“Efficient stack simulation for shared memory set-associative multiprocessor caches,” IBM Report RC# 17607, Jan. 1992. Also in the Proc. ICPP’92, pp. I163-I170, Aug. 1993, and the Proc. ICPDS’93, pp. 512-519, Dec. 1992.
[21] J.G. Thompson,“Efficient analysis of caching systems,” T.R. UCB/CSD 87/374, PhD dissertation, Univ. of Calif., Berkeley, Oct. 1987.
[22] W. Wang,“Multilevel cache hierarchies,” DCS TR 89-09-13, PhD dissertation, Univ. of Washington, Seattle, Sept. 1989.
[23] M.M. Cherian,“A study of backoff barrier synchronization,” MIT/LCS/TR-452, June 1989.
[24] R.A. Sugumar and S.G. Abraham,“Multi-configuration simulation algorithms for the evaluation of computer architecture designs,” CSE-TR-173-93, Univ. of Michigan, Ann Arbor, Aug. 1993.
[25] C.J. Van Wyk,Data Structures and C Programs.Reading, Mass., Addison-Wesley, 1988.
[26] L.J. Guibas and R. Sedgewick,“A dichromatic framework for balanced trees,” Proc. 19th Ann. Symp. Foundations of Computer Science, pp. 8-21, Oct. 1978.
[27] D.D. Sleator and R.E. Tarjan,“Self-adjusting binary search trees,” J. ACM, vol. 32, no. 3, pp. 652-686, July 1985.
[28] J.-L. Baer, private communication, Dec. 1991.
[29] M.D. Hill, "A Case for Direct-Mapped Caches," Computer, Dec. 1988.
[30] A. Gupta and W. Weber,“Analysis of cache invalidation patterns in multiprocessors,”inProc. Int. Symp. Comput. Architect., 1989, pp. 243–455.
[31] Y. Tamir and G. Janakiraman,“Hierarchical coherency management for shared virtual memory multicomputers,” J. Parallel and Distributed Computing, vol. 15, no. 4, pp. 408-419, Aug. 1992.
[32] J. Chame and M. Dubois,“Cache inclusion and processor sampling in multiprocessor simulations,” Proc. ACM Sigmetrics’93, pp. 36-47, May 1993.

Index Terms:
Cache memory, coherence by invalidation, set-associative, simulation, stack evaluation.
Citation:
Yuguang Wu, Richard Muntz, "Stack Evaluation of Arbitrary Set-Associative Multiprocessor Caches," IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 9, pp. 930-942, Sept. 1995, doi:10.1109/71.466631
Usage of this product signifies your acceptance of the Terms of Use.