This Article 
 Bibliographic References 
 Add to: 
Cache Invalidation Patterns in Shared-Memory Multiprocessors
July 1992 (vol. 41 no. 7)
pp. 794-810

The cache invalidation patterns of several parallel applications are analyzed. The results are based on multiprocessor simulations with 8, 16, and 32 processors. To provide deeper insight into the observed invalidation behavior the invalidations observed in the simulations are linked to the high-level objects causing them in the programs. To predict what the invalidation patterns would look like beyond 32 processors, a classification scheme for data objects found in parallel programs is proposed. The classification scheme provides a powerful conceptual tool to reason about the invalidation patterns of parallel applications. Results indicate that it should be possible to scale well-written parallel programs to a large number of processors without an explosion in invalidation traffic. At the same time, the invalidation patterns are such that directory-based schemes with just a few pointers per entry can be very effective. The variations in invalidation behavior with different cache line sizes are discussed. The results indicate that cache line sizes in the 32-byte range yield the lowest data and invalidation traffic.

[1] A. Agarwal and A. Gupta, "Memory-reference characteristics of multiprocessor applications under MACH," inProc. ACM SIGMETRICS Conf Measurement and Modeling of Computer Systems, 1988, pp. 215-226.
[2] A. Agarwal, R. Simoni, J. Hennessy, and M. Horowitz, "An evaluation of directory schemes for cache coherence," inProc. 15th Int. Symp. Comput. Architecture, June 1988, pp. 280-289.
[3] J. K. Bennett, J. B. Carter, and W. Zwaenepoel, "Adaptive software cache management for distributed shared memory architectures," inProc. 17th Annu. Int. Symp. Comput. Architecture, vol. 2, Seattle, WA, May 1990, pp. 125-134.
[4] M. Berryet al., "The Perfect Club benchmarks: Effective performance evaluation of supercomputers," Tech. Rep. 827, Center for Supercomput. Res. Develop., May 1989.
[5] M. Censier and P. Feautier, "A new solution to coherence problems in multicache systems,"IEEE Trans. Comput., vol. C-27, no. 12, pp. 1112-1118, Dec. 1978.
[6] K.M. Chandy and J. Misra, "Asynchronous Distributed Simulation via a Sequence of Parallel Computations,"Comm. ACM, Vol. 24, No. 4, Apr. 1981, pp. 198-206.
[7] H. Davis and S. Goldschmidt, "Tango: A multiprocessor simulation and tracing system," Tech. Rep. CSL-TR-90-439, Stanford Univ., July 1990.
[8] S.J. Eggers and R.H. Katz, "A Characterization of Sharing in Parallel Programs and its Application to Coherency Protocol Evaluation,"Proc. 15th Int'l Symp. Computer Architecture, 1988, IEEE CS Press, Los Alamitos, Calif. Order No. 861, pp. 373-382.
[9] S. J. Eggers and R. H. Katz, "The effect of sharing on the cache and bus performance of parallel programs," inProc. 3rd Int. Conf. Architectural Support Programming Languages Oper. Syst., Boston, MA, Apr. 1989, pp. 257-270.
[10] Encore Computer Corp.,Multimax Technical Summary, 1986.
[11] A. Goldberg and R. Tarjan, "A new approach to the maximum flow problem," inProc. 18th ACM Symp. Theory Comput., 1986, pp. 136-146.
[12] J.R. Goodman, "Using Cache Memory to Reduce Processor Memory Traffic,"Proc. 10th Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., Order No. 473 (microfiche only), 1983, pp. 124-131.
[13] J.R. Goodman, M.K. Vernon, and P.J. Woest, "Efficient Synchronization Primitives for Large-Scale Cache-Coherent Multiprocessors,"Proc. Third Int'l Conf. Architectural Support for Programming Languages and Operating Systems, IEEE CS Press, Los Alamitos, Calif., Order No. 1936, 1989, pp. 64-73.
[14] D. Lenoski et al., "The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor,"Proc. 17th Int'l Symp. Computer Architecture, CS Press, Los Alamitos, Calif., May 1990, pp. 148-159.
[15] T. Lovett and S. Thakkar, "The Symmetry multiprocessor system," inProc. Int. Conf. Parallel Processing, vol. I, Aug. 1988, pp. 303-310.
[16] E. Lusk, R. Overbeek,et al., Portable Programs for Parallel Processors. New York: Holt, Rinehart, and Winston, 1987.
[17] E. Lusk, R. Stevens, and R. Overbeek.A Tutorial on the Use of Monitors in C: Writing Portable Code for Multiprocessors, Argonne National Laboratory, Argonne, IL 60439, 1986.
[18] J. McDonald and D. Baganoff, "Vectorization of a particle simulation method for hypersonic rarified flow," inProc. AIAA Thermodynamics, Plasmadynamics and Lasers Conf., June 1988.
[19] L. Monier and P. Sindhu, "The architecture of the Dragon," inProc. 30th IEEE Int. Conf., IEEE, Feb. 1985, pp. 118-121.
[20] R. Katz, S. Eggers, D. Wood, C.L. Perkins, and R. Sheldon, "Implementing a cache consistency protocol," inProc. 12th Annu. Int. Symp. Comput. Architecture, vol. 13, June 1985, pp. 276-283.
[21] J. Rose. "LocusRoute: A parallel global router for standard cells," inProc. Design Automat. Conf., June 1988, pp. 189-195.
[22] L. Rudolph and Z. Segall, "Dynamic decentralized cache consistency schemes for MIMD parallel processors," inProc. 12th Int. Symp. Comput. Architecture, pp. 355-362, June 1985. Also SIGARCH Newsletter, vol. 13, issue 3, 1985.
[23] J.P. Singh, W.-D. Weber, and A. Gupta, "SPLASH: Stanford parallel applications for shared-memory," Tech. Rep. CSL-TR-91-469, Stanford Univ., Apr. 1991.
[24] L. Soule and A. Gupta, "Characterization of parallelism and deadlocks in distributed logic simulation," inProc. 26th Design Automat. Conf., June 1989, pp. 81-86.
[25] C. Thacker and L. Stewart. "Firefly: A multiprocessor workstation," inProc. 2nd Int. Conf. Architectural Support for Programming Languages Oper. Syst., Oct. 1987, pp. 164-172.
[26] J. Torrellas, M. Lam, and J. Hennessy, "Measurement, analysis, and improvement of the cache behavior of shared data in cache-coherent multiprocessors," Tech. Rep. CSL-TR-90-412, Stanford Univ., Feb. 1990.
[27] W.-D. Weber and A. Gupta, "Analysis of cache invalidation patterns in multiprocessors," inProc. 3rd Int. Conf. Architectural Support Programming Languages Oper. Syst., Boston, MA, Apr. 1989, pp. 243-256.
[28] P.-C. Yew, N.-F. Tzeng, and D.H. Lawrie, "Distributing hot-spot addressing in large-scale multiprocessors,"IEEE Trans. Comput., vol. C- 36, pp. 388-395, Apr. 1987.

Index Terms:
shared-memory multiprocessors; cache invalidation patterns; simulations; high-level objects; classification scheme; data objects; parallel programs; conceptual tool; invalidation patterns; directory-based schemes; buffer storage; digital simulation; multiprocessing systems; parallel programming.
A. Gupta, W.-D. Weber, "Cache Invalidation Patterns in Shared-Memory Multiprocessors," IEEE Transactions on Computers, vol. 41, no. 7, pp. 794-810, July 1992, doi:10.1109/12.256449
Usage of this product signifies your acceptance of the Terms of Use.