This Article 
 Bibliographic References 
 Add to: 
Speculative Versioning Cache
December 2001 (vol. 12 no. 12)
pp. 1305-1317

Dependences among loads and stores whose addresses are unknown hinder the extraction of instruction level parallelism during the execution of a sequential program. Such ambiguous memory dependences can be overcome by memory dependence speculation which enables a load or store to be speculatively executed before the addresses of all preceding loads and stores are known. Furthermore, multiple speculative stores to a memory location create multiple speculative versions of the location. Program order among the speculative versions must be tracked to maintain sequential semantics. A previously proposed approach, the Address Resolution Buffer (ARB) uses a centralized buffer to support speculative versions. Our proposal, called the Speculative Versioning Cache (SVC), uses distributed caches to eliminate the latency and bandwidth problems of the ARB. The SVC conceptually unifies cache coherence and speculative versioning by using an organization similar to snooping bus-based coherent caches. Our evaluation for the Multiscalar architecture shows that hit latency is an important factor affecting performance and private cache solutions trade-off hit rate for hit latency.

[1] IEEE Standard for Scalable Coherent Interface (SCI) pp. 1596-1992, 1993.
[2] M. Cintra, J.F. Martinez, and J. Torrellas, “Architectural Support for Scalable Speculative Parallelization in Shared-Memory Multiprocessors,” Proc. 27th Ann. Int'l Symp. Computer Architecture, pp. 13-24, June 2000.
[3] F. Dahlgren, “Boosting the Performance of Hybrid Snooping Cache Protocols,” Proc. 22nd Ann. Int'l Symp. Computer Architecture, pp. 60-69, 1995.
[4] M. Franklin and G.S. Sohi,"The Expandable Split Window Paradigm for Exploiting Fine-Grain Parallelism," Proc. 19th Ann. Int'l Symp. Computer Architecture, pp. 58-67, 1992.
[5] M. Franklin and G.S. Sohi, "ARB: A Hardware Mechanism for Dynamic Reordering of Memory References," IEEE Trans. Computers, May 1996, pp. 552-571.
[6] J.R. Goodman, "Using Cache Memory to Reduce Processor-Memory Traffic," Proc. 10th Ann. Symp. Computer Architecture, pp. 124-132, 1983.
[7] J.R. Goodman and P.J. Woest, “The Wisconsin Multicube: A New Large-Scale Cache-Coherent Multiprocessor,” Proc. 15th Ann. Int'l Symp. Computer Architecture, pp. 422-431, 1988.
[8] S. Gopal, T.N. Vijaykumar, J.E. Smith, and G.S. Sohi, Speculative Versioning Cache Proc. Fourth Int'l Symp. High-Performance Computer Architecture, 1998.
[9] L. Hammond, M. Willey, and K. Olukotun, "Data Speculation Support for a Chip Multiprocessor," Proc. Eighth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), ACM Press, Oct. 1998, pp. 58-69.
[10] J.S. Liptay, “Structural Aspects of the System/360 Model 85 Part II: The Cache,” IBM Systems J., vol. 7, no. 1, pp. 15-21, 1968.
[11] A. Moshovos et al., "Dynamic Speculation and Synchronization of Data Dependences," Proc. 24th Int'l Symp. on Computer Architecture, IEEE CS Press, Los Alamitos, Calif., 1997, pp. 181-193.
[12] K. Olukotun et al., "The Case for a Single-Chip Multiprocessor," Proc. Int'l Conf. Architectural Support for Programming Languages and Operating Systems, ACM, 1996, pp. 2-11.
[13] S. Palacharla, N.P. Jouppi, and J.E. Smith, "Complexity-Effective Superscalar Processors," Proc. Int'l Symp. Computer Architecture, ACM, 1997, pp. 206-218.
[14] E. Rotenberg, Q. Jacobson, Y. Sazeides, and J.E. Smith, Trace Processors Proc. 30th Int'l Symp. Microarchitecture, pp. 138-148, 1997.
[15] G.S. Sohi, S.E. Breach, and T. Vijaykumar, "Multiscalar Processors," Proc. Int'l Symp. Computer Architecture, ACM, 1995, pp. 414-425.
[16] J.G. Steffan, C.B. Colohan, A. Zhai, and T.C. Mowry, “A Scalable Approach to Thread-Level Speculation,” Proc. 27th Int'l Symp. Computer Architecture, pp. 1-12, June 2000.
[17] J. Steffan and T. Mowry, The Potential of Using Thread-Level Data Speculation to Facilitate Automatic Parallelization Proc. Fourth Int'l Symp. High-Performance Computer Architecture, pp. 2-13, 1998.
[18] S. Vajapeyam and T. Mitra, "Improving Superscalar Instruction Dispatch and Issue by Exploiting Dynamic Code Sequences, Proc. 24th Int'l Symp. Computer Architecture, ACM Press, New York, 1997, pp. 1-12.
[19] Y. Zhang, L. Rauchwerger, and J. Torrellas, “Hardware for Speculative Parallelization of Partially-Parallel Loops in DSM Multiprocessors,” Fifth Int'l Symp. High-Performance Computer Architecture (HPCA-5), Jan. 1999,

Index Terms:
Speculative memory, memory disambiguation, snooping cache coherence protocols, speculative versioning
T.N. Vijaykumar, S. Gopal, J.E. Smith, G. Sohi, "Speculative Versioning Cache," IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 12, pp. 1305-1317, Dec. 2001, doi:10.1109/71.970565
Usage of this product signifies your acceptance of the Terms of Use.