This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Analytical Prediction of Performance for Cache Coherence Protocols
November 1997 (vol. 46 no. 11)
pp. 1155-1173

Abstract—In this paper, we introduce new analytical models for predicting the performance of parallel applications under various cache coherence protocol assumptions. The purpose of these models is to determine which protocols are to be used for which data blocks, and, in the case of dynamic protocols, also to determine when to change protocols. Although we focus on tightly-coupled multiprocessor systems, similar models can be derived for loosely-coupled distributed systems, such as networks of workstations.

Our models are unique in that they lie between a large body of theoretical models that assume independence and a uniform distribution of memory accesses across processors, and a large body of address-trace oriented models that assume the availability of a precise characterization of interleaving behavior of memory accesses. The former are not very realistic, and the latter are not suitable for compile-time and run-time usage. In contrast, our models enable us to choose different input parameters depending on how the models will be used and depending on the needed accuracy in performance prediction.

We present the models and show how the required parameters can be obtained. We assess the accuracy of our models on 15 parallel applications. For these applications, our most complete model predicts performance within a 10 percent margin when compared to a simulation of a sequentially consistent multiprocessor system. As part of this study, we also show the potential advantage of using dynamic hybrid protocols.

[1] J. Veenstra and R. Fowler, "A Performance Evaluation of Optimal Hybrid Cache Coherency Protocols," Proc. 1992 Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 149-157, Oct. 1992.
[2] F. Mounes-Toussi and D. Lilja, “The Potential of Compile-Time Analysis to Adapt the Cache Coherence Enforcement Strategy to the Data Sharing Characteristics,” IEEE Trans. Parallel and Distributed Systems, 1995.
[3] Z.G. Vranesic et al., "The NUMAchine Multiprocessor," Technical Report CSRI-324, Computer Systems Research Inst., Univ. of Toronto, Canada, 1995.
[4] J.P. Singh, W.D. Weber, and A. Gupta, "SPLASH: Stanford Parallel Applications for Shared Memory," Proc. 19th Annual Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., May 1992, pp. 5-14.
[5] S.C. Woo et al., "The SPLASH-2 Programs: Characterization and Methodological Considerations," Proc. 22nd Annual Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., June 1995, pp. 24-36.
[6] A. Wilson, R. LaRowe, and M. Teller, "Hardware Assist for Distributed Shared Memory," Proc. 13th Int'l Conf. on Distributed Computing Systems, CS Press, 1993, pp. 246-255.
[7] J. Carter, J. Bennett, and W. Zwaenepoel, "Techniques for Reducing Consistency-Related Communication in Distributed Shared Memory Systems," ACM Trans. Computer Systems, Vol. 13, No. 3, Aug. 1995, pp. 205-244.
[8] A. Duda, "Analysis of Multicast-Based Object Replication Strategies in Distributed Systems," Proc. 13th Int'l Conf. Distributed Computing Systems, pp. 311-318,Pittsburgh, Penn., May 1993.
[9] M. Stumm and S. Zhou, "Algorithms Implementing Distributed Shared Memory," Computer, Vol. 23, No. 5, May 1990, pp. 54-64.
[10] M. Dubois and F.A. Briggs, "Effects of Cache Coherency in Multiprocessors," IEEE Trans. Computers, vol. 31, no. 11, pp. 1,083-1,099, Nov. 1982.
[11] J.H. Patel, "Analysis of Multiprocessors with Private Cache Memories," IEEE Trans. Computers, vol. 31, no. 4, pp. 296-304, Apr. 1982.
[12] Q. Yang, L. Bhuyan, and B.-C. Liu, "Analysis and Comparison of Cache Coherence Protocols for a Packet-Switched Multiprocessor," IEEE Trans. Computers., vol. 38, no. 8, pp. 1,143-1,153, Aug. 1989.
[13] A.R. Karlin et al., "Competitive Snoopy Caching," Proc. 27th Ann. Symp. Foundations of Computer Science, pp. 244-254, Oct. 1986.
[14] S.J. Eggers, “Simplicity versus Accuracy in a Model of Cache Coherency Overhead,” IEEE Trans. Computers, vol. 40, no. 8, pp. 893-906, Aug. 1991.
[15] S.J. Eggers and R.H. Katz, "A Characterization of Sharing in Parallel Programs and Its Application to Coherency Protocol Evaluation," Proc. 15th Ann. Int'l Symp. Computer Architecture, IEEE Computer Society Press, Los Alamitos, Calif., 1988, pp. 373-382.
[16] M. Dubois and J.-C. Wang, “Shared Block Contention in a Cache Coherence Protocol,” IEEE Trans. Computers, vol. 40, no. 5, pp. 640-644, May 1991.
[17] M. Dubois and J.-C. Wang, "Shared Data Contention in a Cache Coherence Protocol," Proc. 1988 Int'l Conf. Parallel Processing, vol. I, pp. 146-155, Aug. 1988.
[18] J. Archibald and J.L. Baer, "Cache Coherence Protocols: Evaluation Using a Multiprocessor Simulation Model," ACM Trans. Computer Systems, vol. 4, no. 4, Nov. 1986.
[19] J. K. Archibald,“A cache coherence approach for large multiprocessor system,”inProc. 2nd Int. Conf. Supercomput., 1988, pp. 337–345.
[20] A.L. Cox and R.J. Fowler, "Adaptive Cache Coherency for Detecting Migratory Shared Data," Proc. 20th Ann. Int'l Symp. Computer Architecture, IEEE Computer Soc. Press, Los Alamitos, Calif., 1993, pp. 98-108.
[21] P. Stenström, M. Brorsson, and L. Sandberg, "An Adaptive Cache Coherence Protocol Optimized for Migratory Sharing," Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 109-118, 1993.
[22] A. Gupta and W. Weber,“Analysis of cache invalidation patterns in multiprocessors,”inProc. Int. Symp. Comput. Architect., 1989, pp. 243–455.
[23] M. Brorsson and P. Stenstrom, "Visualizing Sharing Behavior in Relation to Shared Memory Management," Proc. 1992 Int'l Conf. Parallel and Distributed Systems, pp. 528-536,Hsinchu, Taiwan, Dec. 1992.
[24] M. Brorsson and P. Stenström, “Modelling Accesses to Migratory and Producer-Consumer Characterized Data in a Shared Memory Multiprocessor,” Proc. Sixth Symp. Parallel and Distributed Processing, pp. 612-619, Oct. 1994.
[25] M. Brorsson, "SM-prof: A Tool to Visualize and Find Cache Coherence Performance Bottlenecks in Multiprocessor Programs," Proc. 1995 ACM SIGMETRICS and Performance '95, Int'l Conf. Measurement&Modeling of Computer Systems, pp. 178-187,Ottawa, Canada, May 1995.
[26] S.V. Adve, M.D. Hill, and M. Vernon, “Comparison of Hardware and Software Cache Coherence Schemes,” Proc. 18th Int'l Symp. Computer Architecture, pp. 298-308, May 1991.
[27] S. Srbljic, "Model of Distributed Processing in Flexible Manufacturing Systems," PhD dissertation, Inst. for Electronics, Faculty of Electrical Eng., Univ. of Zagreb, Croatia, Nov. 1990. (Work published in Croatian, original title: "Model distribuirane obrade u prilagodljivim proizvodnim sustavima")
[28] S. Srbljic and L. Budin, "Analytical Performance Evaluation of Data Replication Based Shared Memory Model," Proc. Second IEEE Int'l Symp. High Performance Distributed Computing, pp. 326-335,Spokane, Wash., July 1993.
[29] S. Srbljic, Z.G. Vranesic, and L. Budin, "Performance Prediction for Different Consistency Schemes in Distributed Shared Memory Systems," Proc. Third IEEE Int'l Symp. High Performance Distributed Computing, pp. 295-302,San Francisco, Aug. 1994.
[30] V. Balasundaram, “A Mechanism for Keeping Useful Internal Information in Parallel Programming Tools: The Data Access Descriptor,” J. Parallel and Distributed Computing, vol. 9, pp. 154-170, 1990.
[31] M.W. Hall, S.P. Amarasinghe, B.R. Murphy, S. Liao, and M. Lam, "Detecting Coarse-Grain Parallelism Using an Interprocedural Parallelizing Compiler," Proc. Supercomputing '95,San Diego, Calif., Dec. 1995.
[32] S. Srbljic et al., "Models for Performance Prediction of Cache Coherence Protocols," Technical Report CSRI-332, Computer Systems Research Inst., Univ. of Toronto, Canada, 1995. (http://www.netlib.org/ncwn/pvmsystem.psftp:/ /ftp.cs.toronto.edu/pub/reports/csri/ 332332.ps.Z)
[33] J. Heinrich,MIPS 4000 User's Manual, Prentice-Hall, 1993.
[34] J.E. Veenstra and R.J. Fowler, MINT Tutorial and User Manual, Tech. Report 452, CS Dept., Univ. of Rochester, Rochester, N.Y., 1993; revised August 1994.
[35] B. Boothe and A. Ranade, "Performance on a Bandwidth Constrained Network: How Much Bandwidth Do We Need?" Proc. Supercomputing '93,Portland, Ore., Nov. 1993.
[36] F. Dahlgren, “Boosting the Performance of Hybrid Snooping Cache Protocols,” Proc. 22nd Ann. Int'l Symp. Computer Architecture, pp. 60-69, 1995.
[37] H.V. Leong and D. Agrawal, "Type-Specific Coherence Protocols for Distributed Shared Memory," Proc. 12th Int'l Conf. Distributed Computing Systems, pp. 434-441,Yokohama, Japan, June 1992.
[38] S. Dwarkadas et al., "Evaluation of Release Consistent Software Distributed Shared Memory on Emerging Network Technology," Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 144-155,San Diego, Calif., May 1993.
[39] F. Dahlgren, M. Dubois, and P. Stenström, "Combined Performance Gains of Simple Cache Protocol Extensions," Proc. 21st Int'l Symp. Computer Architecture, pp.187-197, 1994,.
[40] J.E. Veenstra and R.J. Fowler, “The Prospects for On-Line Hybrid Coherency Protocols on Bus-Based Multiprocessors,” Technical Report TR 490, Computer Science Dept., Univ. of Rochester Mar. 1994.

Index Terms:
Cache coherence, distributed shared memory, memory access behavior, analytical performance prediction, performance evaluation, dynamic hybrid protocols.
Citation:
Sinisa Srbljic, Zvonko G. Vranesic, Michael Stumm, Leo Budin, "Analytical Prediction of Performance for Cache Coherence Protocols," IEEE Transactions on Computers, vol. 46, no. 11, pp. 1155-1173, Nov. 1997, doi:10.1109/12.644291
Usage of this product signifies your acceptance of the Terms of Use.