This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Silent Stores and Store Value Locality
November 2001 (vol. 50 no. 11)
pp. 1174-1190

Abstract—Value locality, a recently discovered program attribute that describes the likelihood of the recurrence of previously seen program values, has been studied enthusiastically in the recent published literature. Much of the energy has focused on refining the initial efforts at predicting load instruction outcomes, with the balance of the effort examining the value locality of either all register-writing instructions or a focused subset of them. Surprisingly, there has been very little published characterization of or effort to exploit the value locality of data words stored to memory by computer programs. This paper presents such a characterization, including detailed source-level analysis of the causes of silent stores, proposes both memory-centric (based on message passing) and producer-centric (based on program structure) prediction mechanisms for stored data values, introduces the concept of silent stores and new definitions of multiprocessor false sharing based on these observations, and suggests new techniques for aligning cache coherence protocols and microarchitectural store handling techniques to exploit the value locality of stores. We find that realistic implementations of these techniques can significantly reduce multiprocessor data bus traffic and are more effective at reducing address bus traffic than the addition of Exclusive state to a MSI coherence protocol. We also show that squashing of silent stores can provide uniprocessor speedups greater than the addition of store-to-load forwarding.

[1] S.C. Woo et al., "The SPLASH-2 Programs: Characterization and Methodological Considerations," Proc. 22nd Annual Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., June 1995, pp. 24-36.
[2] B. Calder, G. Reinman, and D. Tullsen, Selective Value Prediction Proc. 26th Int'l Symp. Computer Architecture, 1999.
[3] B. Calder, P. Feller, and A. Eustace, “Value Profiling,” Proc. 30th Ann. ACM/IEEE Int'l Symp. Microarchitecture, Dec. 1997.
[4] M. Dubois, J. Skeppstedt, L. Ricciulli et al., , "The Detection and Elimination of Useless Misses in Multiprocessors," Proc. 20th Int'l Symp. Computer Architecture, pp. 88-97, May 1993.
[5] J. Edmondson et al., “Internal Organization of the Alpha 21164, a 300-MHz, 64-Bit, Quad-Issue, CMOS RISC Microprocessor,” Digital Technical J., vol. 7, no. 1, 1995.
[6] S.J. Eggers and T.E. Jeremiassen, “Eliminating False Sharing,” Proc. 20th Int'l Conf. Parallel Processing, Aug. 1991
[7] J. González and A. González, “Control-Flow Speculation through Value Prediction for Superscalar Processors,” Proc. Int'l Conf. Parallel Architecture and Compilation Techniques, 1999.
[8] J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann, San Mateo, Calif., 1990.
[9] T. Keller, A.M. Maynard, R. Simpson, and P. Bohrer, “Simos-ppc Full System Simulator,” http://www.cs.utexas.edu/users/cartsimOS , 2001.
[10] M.H. Lipasti, C.B. Wilkerson, and J.P. Shen, "Value Locality and Load Value Prediction," Proc. Seventh Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, ACM Press, New York, 1996, pp. 138-147.
[11] M.H. Lipasti and J.P. Shen, "Exceeding the Data-Flow Limit Via Value Prediction," Proc. 29th Ann. ACM/IEEE Int'l Symp. on Microarchitecture, IEEE CS Press, Los Alamitos, Calif., 1996, pp. 226-237.
[12] A. Mendelson and F. Gabbay, “Speculative Execution Based on Value Prediction,” technical report, Technion, 1997, http:/www-ee.technion.ac.il.
[13] C. Molina, A. Gonzalez, and J. Tubella, “Reducing Memory Traffic via Redundant Store Instructions,” Proc. Int'l Conf. High Perfermance Computing and Networking, pp. 1246-1249, Apr. 1999.
[14] A. Moshovos, “Memory Dependence Prediction,” PhD thesis, Univ. of Wisconsin, Dec. 1998.
[15] J. Torrellas, M.S. Lam, and J.L. Hennessy, “Shared Data Placement Optimizations to Reduce Multiprocessor Cache Misses,” Proc. Int'l Conf. Parallel Processing, Aug. 1990.
[16] Transaction Processing Performance Council, TPC benchmarks, http:/www.tpc.org, 2001.
[17] K. Lepak and M.H. Lipasti, “Silent Stores for Free,” Proc. 33rd Int'l Symp. Microarchitecture, Dec. 2000.
[18] K. Lepak and M.H. Lipasti, “On the Value Locality of Store Instructions,” Proc. 27th Int'l Symp. Computer Architecture, June 2000.
[19] G. Bell, K. Lepak, and M.H. Lipasti, “A Characterization of Silent Stores,” Proc. Parallel Architectures and Compilation Technique, Oct. 2000.
[20] H. Cain, M. Marden, R. Rajwar, and M.H. Lipasti, “A Characterization of Java TPC-W,” Proc. Int'l Symp. High Performance Computer Architecture, Jan. 2001.
[21] SPECWEB99 Benchmark Specification, available fromhttp:/www.specbench.org, 2001.
[22] SPECJBB2000 Benchmark Specification, available fromhttp:/www.specbench.org, 2001.
[23] S.P. Harbison, “An Architectural Alternative to Optimizing Compilers,” Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 57-65, Mar. 1982.
[24] S.E. Richardson, “Exploiting Trivial and Redundant Computation,” Proc. 11th Symp. Computer Arithmetic, pp. 220-227, July 1993.
[25] A. Sodani and G.S. Sohi, “Dynamic Instruction Reuse,” Proc. 24th Ann. Int'l Symp. Computer Architecture, 1997.
[26] H.-H.S. Lee, G.S. Tyson, and M.K. Farrens, “Eager Writeback—A Technique for Improving Bandwidth Utilization,” Proc. Int'l Symp. Microarchitecture, Dec. 2000.
[27] A. Yoaz, R. Ronen, R.S. Chappell, and Y. Almog, “Silence Is Golden,” Proc. Work-in-Progress Workshop in conjunction with Seventh Int'l Symp. High Performance Architecture (HPCA-7), Jan. 2001.
[28] D.C. Burger and T.M. Austin, “The Simplescalar Tool Set, Version 2.0,” Technical Report CS-TR-97-1342, Univ. of Wisconsin, Madison, June 1997.
[29] S. Kaxiras and J.R. Goodman, “Improving CC-NUMA Performance Using Instruction-Based Prediction,” Proc. Int'l Symp. High Performance Computer Architecture, Jan. 1999.
[30] A. Appel and M. Ginsburg, Modern Compiler Implementation in C. Cambridge, U.K., New York: Cambridge Univ. Press, 1998.

Index Terms:
Value locality, value prediction, store optimization, false sharing, cache coherence.
Citation:
Kevin M. Lepak, Gordon B. Bell, Mikko H. Lipasti, "Silent Stores and Store Value Locality," IEEE Transactions on Computers, vol. 50, no. 11, pp. 1174-1190, Nov. 2001, doi:10.1109/12.966493
Usage of this product signifies your acceptance of the Terms of Use.