This Article 
 Bibliographic References 
 Add to: 
Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps
January 1992 (vol. 3 no. 1)
pp. 25-44
A timestamp-based software-assisted cache coherence scheme that does not require any global communication to enforce the coherence of multiple private caches is proposed. It is intended for shared memory multiprocessors. The scheme is based on a compile-time marking of references and a hardware-based local incoherence detection scheme. The possible incoherence of a cache entry is detected and the associated entryis implicitly invalidated by comparing a clock (related to program flow) and a timestamp (related to the time of update in the cache). Results of a performance comparison, which is based on a trace-driven simulation using actual traces. between the proposed timestamp-based scheme and other software-assisted schemes indicate that the proposed scheme performs significantly better than previous software-assisted schemes, especially when the processors are carefully scheduled so as to maximize the reuse of cache contents. This scheme requires neither a shared resource nor global communication and is, therefore, scalable up to a large number of processors.

[1] A. Agarwal, R. Simoni, J. Hennessy, and M. Horowitz, "An evaluation of directory schemes for cache coherence," inProc. 15th Int. Symp. Comput. Architecture, June 1988, pp. 280-289.
[2] F. E. Allen, M. Burke, P. Charles, R. Cytron, and J. Ferrante, "An overview of the PTRAN analysis system for multiprocessing," inProc. First Int. Conf. Supercomput., June 1987, pp. 194-211.
[3] J.R. Allen and K. Kennedy, "PFC: A program to convert FORTRAN to parallel form," MASC Tech. Rep. 82-6, Dep. Math. Sci., Rice Univ., Mar. 1982.
[4] J. Archibald and J.-L. Baer, "An economical solution to the cache coherence problem," inProc. 12th Annu. Int. Symp. Comput. Architecture, June 1955, pp. 355-362.
[5] BBN,Butterfly Parallel Processor Overview, Version 1. Dec. 1985.
[6] D. Callahan, "A global spproach to detection of parallelism," Rice Univ., Apr. 1987.
[7] L. M. Censier and P. Feautrier, "A new solution to coherence problems in multicache systems,"IEEE Trans. Comput., vol. C-27, pp. 1112-1118, Dec. 1978.
[8] Cheong, H., and A. Veidenbaum, "A Version Control Approach to Cache Coherence, "Proc Int'l Conf. Supercomputing, June 1989, pp. 322-330.
[9] H. Cheong and A. Veidenbaum, "Stale data detection and coherence enforcement using flow analysis," inProc. 1988 Int. Conf. Parallel Processing, Vol. I Architecture, Aug. 1988, pp. 138-145.
[10] H. Cheong and A. Veidenbaum, "A cache coherence scheme with fast selective invalidation," inProc. 15th Annu. Int. Symp. Comput. Architecture, June 1988, pp. 299-307.
[11] R. Cytron, "Doacross: Beyond vectorization for multiprocessors," inProc. 1986 Int. Conf. Parallel Processing, IEEE, Aug. 1986, pp. 836-844.
[12] R. Cytron, S. Karlovsky, and K. P. McAuliffe, "Automatic management of programmable caches (extended abstract)," inProc. 1988 Int. Conf. Parallel Processing, Vol. II Software, Aug. 1988, pp. 229-238.
[13] G. A. Darmohray and E. D. Brooks III, "Gaussian techniques on shared memory multiprocessor computers," unpublished Tech. Rep., UCRL- 97939, preprint.
[14] S. J. Eggers, "Simulation analysis of data sharing in shared memory multiprocessors," Ph.D. dissertation, Tech. Rep. UCB/Computer Science Dep. 89/501, Univ. of California, Berkeley, Mar. 1989.
[15] D. Gajski, D. Kuck, D. Lawrie, and A. Sameh, "Cedar-A large scale multiprocessor,"Comput. Architecture News, vol. 11, no. 1, pp. 7-11, Mar. 1983.
[16] W. Gentzsch, "Vectorization of computer programs with applications to computational fluid dynamics," vol. 8,Notes on Numerical Fluid Mechanics, Friedr. Vieweg&Sohn Verlagsgesellschaft mbH, Braunschweig 1984.
[17] J.R. Goodman, "Using Cache Memory to Reduce Processor Memory Traffic,"Proc. 10th Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., Order No. 473 (microfiche only), 1983, pp. 124-131.
[18] A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultrcomputer-Designing an MIMD shared memory parallel computer," inProc. 9th Annu. Int. Symp. Comput. Architecture, 1982, pp. 27-42.
[19] R. R. Henry, "Address and instruction tracing for the VAX architecture," unpublished tech. rep., Aug. 1983.
[20] E. D. Brooks III, "Performance of the butterfly processor-memory interconnection in a vector environment," inProc. 1985 Int. Conf. Parallel Processing, IEEE, Aug. 1985, pp. 21-24.
[21] E. D. Brooks III and G. A. Darmohray, "A parallel extension of C that is 99% fat free," unpublished tech. rep.
[22] R. Katz, S. Eggers, D. Wood, C.L. Perkins, and R. Sheldon, "Implementing a cache consistency protocol," inProc. 12th Annu. Int. Symp. Comput. Architecture, vol. 13, June 1985, pp. 276-283.
[23] D. J. Kuck,The Structure of Computers and Computations, vol. 1. New York: Wiley, 1978.
[24] D. J. Kuck, R.H. Kuhn, B. Leasure, and M. Wolfe, "The structure of an advanced vectorizer for pipelined processors," inProc. Comput. Software Appl. Conf. (COMPSAC80), IEEE, Oct. 1980, pp. 709-715.
[25] R. L. Lee, "The effectiveness of caches and data prefetch buffers in large-scale shared memory multiprocessors," Center for Supercomputing Res. and Develop., Univ. of Illinois, CSRD Rep. 670, May 1987.
[26] R. L. Lee, P. C. Yew, and D. H. Lawrie, "Multiprocessor cache design considerations," inProc. 14th Annu. Int. Symp. Comput. Architecture, June 1987, pp. 253-262.
[27] K. P. McAuliffe, "Analysis of cache memories in highly parallel systems," New York Univ., May 1986.
[28] S.L. Min, "Memory hierarchy management schemes in large scale shared-memory multiprocessors," Univ. of Washington, 1989.
[29] S. L. Min and J.-L. Baer, "A timestamp-based cache coherence scheme," inProc. 1989 Int. Conf. Parallel Processing, Vol. I Architecture, Aug. 1989, pp. 23-32.
[30] S. L. Min and J.-L. Baer, "A performance comparison of directory-based and timestamp-based cache coherence schemes," inProc. 1990 Int. Conf. Parallel Processing, Vol. I Architecture, Aug. 1990, pp. 305-311.
[31] G.F. Pfister, W.C. Brantley, D.A. George, S. L. Harvey, W. J. Kleinfelder, K.P. McAuliff, E.A. Melton, V.A. Norton, and J. Weiss, "The IBM Research Parallel Processor Prototype (RP3): Introduction and architecture," inProc. 1985 Int. Conf. Parallel Processing, IEEE, Aug. 1985, pp. 764-771.
[32] L. Rudolph and Z. Segall, "Dynamic decentralized cache consistency schemes for MIMD parallel processors, " inProc. 12th Annu. Int. Symp. Comput. Architecture, June 1985, pp. 340-347.
[33] A. J. Smith, "CPU cache consistency with software support and using 'one time identifiers'," inProc. Pacific Comput. Commun. Symp., Oct. 1985, pp. 22-24.
[34] A. J. Smith, "Line (block) size choice for CPU cache memories,"IEEE Trans. Computers, vol. 36, no. 9, pp. 1063-1074, 1987.
[35] L. Snyder, "Type architectures, shared memory and the corollary of modest potential,"Annu. Rev. Comput. Sci., vol. 1, pp. 289-317, 1986.
[36] C. K. Tang, "Cache design in the tightly coupled multiprocessor system," inAFIPS Conf. Proc. Nat. Comput. Conf., 1976, pp. 749-753.
[37] C. Thacker and L. Stewart. "Firefly: A multiprocessor workstation," inProc. 2nd Int. Conf. Architectural Support for Programming Languages Oper. Syst., Oct. 1987, pp. 164-172.
[38] A. V. Veidenbaum, "A compiler-assisted cache coherence solution for multiprocessors," inProc. 1986 Int. Conf. Parallel Processing, Aug. 1986, pp. 1029-1036.
[39] M. Wolfe, "Optimizing compilers for supercomputers," Dep. Comput. Sci., Univ. of Illinois at Urbana-Champaign, UIUCDCS-R-82-1105, Oct. 1982.
[40] W. A. Wulf and C. G. Bell, "C.mmp-A multi-mini processor," inProc. Fall Joint Comput. Conf., Montvale, NJ, Dec. 1972, pp. 765-777.
[41] W.C. Yen, D. W. L. Yen, and K.-S. Fu, "Data coherence problem in a multicache system,"IEEE Trans. Comput., vol. C-34, pp. 56-65, Jan. 1985.

Index Terms:
Index Termscache contents reuse; scalable cache coherence; clocks; timestamps; multiple privatecaches; shared memory multiprocessors; compile-time marking; references;hardware-based local incoherence detection; program flow; trace-driven simulation;buffer storage; parallel programming; storage management
S.L. Min, J.L. Baer, "Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps," IEEE Transactions on Parallel and Distributed Systems, vol. 3, no. 1, pp. 25-44, Jan. 1992, doi:10.1109/71.113080
Usage of this product signifies your acceptance of the Terms of Use.