• Publication
  • 1995
  • Issue No. 5 - May
  • Abstract - The Potential of Compile-Time Analysis to Adapt the Cache Coherence Enforcement Strategy to the Data Sharing Characteristics
 This Article 
 Bibliographic References 
 Add to: 
The Potential of Compile-Time Analysis to Adapt the Cache Coherence Enforcement Strategy to the Data Sharing Characteristics
May 1995 (vol. 6 no. 5)
pp. 470-481

Abstract—Cache coherence schemes that dynamically adapt to memory referencing patterns have been proposed to improve coherence enforcement in shared-memory multiprocessors. By using only run-time information, however, these existing schemes are incapable of looking ahead in the memory referencing stream. We present a combined hardware-software strategy that uses the predictive capability of the compiler to select updating or invalidating for each write reference. To determine the potential performance improvement that can be achieved with this optimization, three different levels of compiler capabilities are examined. Simulations using memory traces show that with an ideal compiler, this optimization can potentially reduce the miss ratio by 0.4% to 15% compared to an invalidating-only scheme, while reducing the generated network traffic by 13% to 94 % compared to an updating-only scheme. In addition, this optimization can potentially reduce the miss ratio by up to 13%, while reducing the generated network traffic by up to 92%, compared to a dynamic adaptive scheme. Furthermore, performance can be potentially improved even with a compiler capable of performing only imprecise array subscript analysis and no interprocedural analysis.

Index Terms—Compiler optimization, update, invalidate, directory, cache coherence, shared-memory, multiprocessor.

[1] S. V. Adve, V. S. Adve, M. D. Hill, and M. K. Vernon,“Comparison of hardware and software cache coherence schemes,”inProc. Int. Conf. Supercomput., 1991, pp. 298–308.
[2] A. Agarwal and A. Gupta,“Memory-reference characteristics of multiprocessor applications under mach,”inProc. Int. Symp. Comput. Architect., 1988, pp. 215–225.
[3] J. B. Andrews, C. J. Beckmann, and D. K. Poulsen,“Notification and multicast networks for synchronization and coherence,”J. Parallel. Distrib. Comput., vol. 15, pp. 332–350, Aug. 1992.
[4] J. K. Archibald,“A cache coherence approach for large multiprocessor system,”inProc. 2nd Int. Conf. Supercomput., 1988, pp. 337–345.
[5] J. K. Archibald and J. Baer,“An economical solution to the cache coherence problem,”Proc. Int. Symp. Comput. Architect., pp. 355–362, 1984.
[6] J. Bennett, J. Carter, and W. Zwaenepoel, "Munin: Distributed shared-memory based on type-specific memory coherence," Proc. 1990 Conf. Principles and Practice of Parallel Programming.New York: ACM Press, pp. 168-176, 1990.
[7] J. K. Bennett, S. Dwarkadas, J. Greenwood, and E. Speight,“Willow: A scalable shared memory multiprocessor,”inProc. Int. Conf. Supercomput., 1992, pp. 336–345.
[8] M. Berry, D. Chen, P. Koss, D. Kuck, and S. Lo,“The perfect club benchmarks: Effective performance evaluation of supercomputers,”Univ. of Ill., Urbana, IL, CSRD Rep. 827, May 1989.
[9] L. M. Censier and P. Feautrier,“A new solution to coherence problems in multicache coherency schemes,”IEEE Trans. Comput., vol. C-27. pp. 1112–1118, Dec. 1978.
[10] D. Chaiken, J. Kubiatowicz, and A. Agarwal,“LimitLESS directories: A scalable cache coherence scheme,”inProc. Int. Conf. Architect. Support Programm. Languages Oper. Syst., 1991, pp. 224–234.
[11] Y. Chen and M. Dubois,“Cache protocol with partial block invalidation,”inProc. 7th Int. Parallel Process. Symp., 1993, pp. 16–23.
[12] M. Dubois,C. Scheurich,, and F. Briggs,“Memory access buffering in multiprocessors,” Proc. 13th Int’l Symp. Comp. Arch., pp. 434-442, June 1986.
[13] S. J. Eggers and R. H. Katz,“Evaluating the performance of four snooping cache coherency protocols,”inProc. Int. Symp. Comput. Architect., 1989, pp. 1–15.
[14] K. Goshe and S. Simhadri,“A cache coherence mechanism with limited combining capabilities for MIN-based multiprocessors,”inProc. Int. Conf. Parallel Process., 1991, pp. 296–300.
[15] A. Gupta and W. Weber,“Analysis of cache invalidation patterns in multiprocessors,”inProc. Int. Symp. Comput. Architect., 1989, pp. 243–455.
[16] A. Gupta, W. Weber, and T. Mowry,“Reducing memory and traffic requirements for scalable directory-based cache coherence schemes,”inProc. Int. Conf. Parallel Process., 1990, pp. 312–321.
[17] A. R. Karline, M. S. Manass, L. Rudolph, and D. D. Sleator,“Competitive snoopy cacheing,”inProc. 27th Annu. Symp. Comput. Found. Comput. Sci., Oct. 1986, pp. 244–254.
[18] C. P. Kruskal and M. Snir,“The performance of multistage interconnection networks for multiprocessors,”IEEE Trans. Comput., vol. C-32, pp. 1091–1098, Dec. 1983.
[19] D. Lenoskiet al.,“The Standford Dash Multiprocessor,”IEEE Computer, pp. 63–79, Mar. 1992.
[20] D. J. Lilja,“Cache coherence in large-scale shared memory multiprocessors: Issues and comparisons,”ACM Comput. Surv., vol. 25, no. 3, pp. 303–338, Sept. 1993.
[21] D. J. Lilja and P. Yew,“Combining hardware and software cache coherence strategies,”inProc. Int. Conf. Supercomput., 1991, pp. 274–283.
[22] F. Mounes-Toussi,“An adaptive coherence enforcement strategy with compiler assistance,”M.S. thesis, Dep. Elec. Eng., Univ. Minnesota, Minneapolis, MN, 1993.
[23] F. Mounes-Toussi and D. J. Lilja,“Performance limits of compiler-directed multiprocessor cache coherence enforcement,”inThe Interaction of Compilation Technology and Computer Architecture,D. J. Lilja and P. L. Bird, Eds. Norwell, MA: Kluwer, 1994, pp. 161–190.
[24] T.C. Mowry, M.S. Lam, and A. Gupta, “Design and Evaluation of a Compiler Algorithm for Prefetching,” Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, Oct. 1992.
[25] T. N. Nguyen, Z. Li, and D. J. Lilja,“Efficient use of dynamically tagged directories through compiler analysis,”inProc. Int. Conf. Parallel Process., 1993, pp. 112–119.
[26] N. Oba, A. Moriwaki, and S. Shimizu,“Top-1: A snoopy-cache-based multiprocessor,”inProc. IEEE Int. Phoenix Conf. Comput. Commun., Mar. 1990, pp. 101–108.
[27] R. Perron and C. Mundie,“The architecture of the Alliant FX/8 computer,”inProc. IEEE COMPCON, 1986, pp. 390–393.
[28] C. D. Polychronopoulos, M. B. Girkar, M. R. Haghighat, L. Lee, B. P. Leung, and D. A. Schouten,“Parafrase-2: An environment for parallelizing, partitioning, synchronizing, and scheduling programs on multiprocessors,”inProc. Int. Conf. Parallel Process., Aug. 1989, pp. 39–48.
[29] P. N. Swartzrauber,“The SHALLOW benchmark weather prediction program,”Nat. Cen. for Atmospheric Res., Boulder, CO, Tech. Rep., Oct. 1984.
[30] D. M. Tullsen and S. J. Eggers,“Limitations of cache prefetching on a bus-based multiprocessor,”inProc. Int. Symp. Comput. Architect., 1993, pp. 278–288.
[31] A. W. Wilson and R. P. LaRow,“Hiding shared memory reference latency on the Galactica Net distributed shared memory architecture,”J. Parallel Distrib. Comput., vol. 15, pp. 351–367, Aug. 1992.

Farnaz Mounes-Toussi, David J. Lilja, "The Potential of Compile-Time Analysis to Adapt the Cache Coherence Enforcement Strategy to the Data Sharing Characteristics," IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 5, pp. 470-481, May 1995, doi:10.1109/71.382316
Usage of this product signifies your acceptance of the Terms of Use.