This Article 
 Bibliographic References 
 Add to: 
Improving the Performance of Software Distributed Shared Memory with Speculation
September 2005 (vol. 16 no. 9)
pp. 885-896

Abstract—We study the performance benefits of speculation in a release consistent software distributed shared memory system. We propose a new protocol, Speculative Home-based Release Consistency (SHRC), that speculatively updates data at remote nodes to reduce the latency of remote memory accesses. Our protocol employs a predictor that uses patterns in past accesses to shared memory to predict future accesses. We have implemented our protocol in a release consistent software distributed shared memory system that runs on commodity hardware. We evaluate our protocol implementation using eight software distributed shared memory benchmarks and show that it can result in significant performance improvements.

[1] L. Lamport, “How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs,” IEEE Trans. Computers, vol. 28, no. 9, pp. 690-691, Sept. 1979.
[2] C. Amza, A.L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel, “Treadmarks: Shared Memory Computing on Networks of Workstations,” Computer, vol. 29, no. 2, pp. 18-28, Feb. 1996.
[3] L. Iftode, “Home-Based Shared Virtual Memory,” PhD dissertation, Princeton Univ., 1998.
[4] A.L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, and W. Zwaenepoel, “Software versus Hardware Shared-Memory Implementation: A Case Study,” Proc. 21st Ann. Int'l Symp. Computer Architecture (ISCA-21), pp. 106-117, Apr. 1994.
[5] H. Lu, S. Dwarkadas, A.L. Cox, and W. Zwaenepoel, “Message Passing versus Distributed Shared Memory on Networks of Workstations,” Proc. Supercomputing Conf. '95, Dec. 1995.
[6] T.-Y. Yeh and Y.N. Patt, “Two-Level Adaptive Training Branch Prediction,” Proc. 24th Ann. Int'l Symp. Microarchitecture, pp. 51-61, Nov. 1991.
[7] K. Wang and M. Franklin, “Highly Accurate Data Value Prediction Using Hybrid Predictors,” Proc. Int'l Symp. Microarchitecture, pp. 281-290, 1997.
[8] T.-F. Chen and J.-L. Baer, “Effective Hardware-Based Data Prefetching for High Performance Processors,” IEEE Trans. Computers, vol. 44, no. 5, pp. 609-623, May 1995.
[9] T.C. Mowry, M.S. Lam, and A. Gupta, “Design and Evaluation of a Compiler Algorithm for Prefetching,” Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-V), pp. 62-73, Oct. 1992.
[10] L. Lamport, “Time, Clocks, and the Ordering of Events in Distributed Systems,” Comm. ACM, vol. 21, no. 7, pp. 558-565, July 1977.
[11] A.-C. Lai and B. Falsafi, “Memory Sharing Predictor: The Key to a Speculative Coherent DSM,” Proc. 26th Int'l Symp. Computer Architecture (ISCA 26), pp. 172-183, June 1999.
[12] M. Rangarajan and L. Iftode, “Software Distributed Shared Memory over Virtual Interface Architecture: Implementation and Performance,” Proc. Fourth Ann. Linux Conf., pp. 341-352, Oct. 2000.
[13] S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta, “The SPLASH-2 Programs: Characterization and Methodological Considerations,” Proc. 22nd Int'l Symp. Computer Architecture, pp. 24-36, June 1995.
[14] D.E. Culler, A.C. Arpaci-Dusseau, S.C. Goldstein, A. Krishnamurthy, S. Lumetta, T. von Eicken, and K.A. Yelick, “Parallel Programming in Split-C,” Proc. Supercomputing, pp. 262-273, 1993.
[15] D.H. Bailey, E. Barszcz, J.T. Barton, D.S. Browning, R.L. Carter, D. Dagum, R.A. Fatoohi, P.O. Frederickson, T.A. Lasinski, R.S. Schreiber, H.D. Simon, V. Venkatakrishnan, and S.K. Weeratunga, “The NAS Parallel Benchmarks,” The Int'l J. Supercomputer Applications, vol. 5, no. 3, pp. 63-73, Fall 1991.
[16] Netperf Home Page,, 2005.
[17] L. Iftode, C. Dubnicki, E. Felten, and K. Li, “Improving Release-Consistent Shared Virtual Memory Using Automatic Update,” Proc. Second IEEE Symp. High-Performance Computer Architecture, Feb. 1996.
[18] H. Abdel-Shafi, J. Hall, S.V. Adve, and V.S. Adve, “An Evaluation of Fine-Grain Producer-Initiated Communication in Cache-Coherent Multiprocessors,” Proc. Third Int'l Symp. High-Performance Computer Architecture, pp. 204-215, Feb. 1997.
[19] R. Bianchini, R. Pinto, and C.L. Amorim, “Data Prefetching for Software DSMs,” Proc. Int'l Conf. Supercomputing, pp. 385-392, 1998.
[20] M. Karlsson and P. Stenström, “Effectiveness of Dynamic Prefetching in Multiple-Writer Distributed Virtual Shared-Memory Systems,” J. Parallel and Distributed Computing, vol. 43, no. 2, pp. 79-93, 1997.
[21] E. Speight and M. Burtscher, “Delphi: Prediction-Based Page Prefetching to Improve the Performane of Shared Virtual Memory Systems,” Proc. Int'l Conf. Parallel and Distributed Processing Techniques and Applications, June 2002.
[22] P. Keleher, A.L. Cox, S. Dwarkadas, and W. Zwaenepoel, “An Evaluation of Software-Based Release Consistent Protocols,” J. Parallel and Distributed Computing, vol. 29, no. 2, pp. 126-141, Oct. 1995.
[23] P. Keleher, “Update Protocols and Iterative Scientific Applications,” Proc. 12th Int'l Parallel Processing Symp. (IPPS), Mar. 1998.
[24] C. Amza, A. Cox, S. Dwarkadas, L.-J. Jin, K. Rajamani, and W. Zwaenepoel, “Adaptive Protocols for Software Distributed Shared Memory,” Proc. IEEE, special issue on distributed shared memory systems, vol. 87, no. 3, Mar. 1999.
[25] R. Pinto, R. Bianchini, and C. Amorim, “Comparing Latency-Tolerance Techniques for Software DSM Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 14, no. 11, Nov. 2003.

Index Terms:
Distributed shared memory, protocol design and analysis, speculation.
Michael Kistler, Lorenzo Alvisi, "Improving the Performance of Software Distributed Shared Memory with Speculation," IEEE Transactions on Parallel and Distributed Systems, vol. 16, no. 9, pp. 885-896, Sept. 2005, doi:10.1109/TPDS.2005.110
Usage of this product signifies your acceptance of the Terms of Use.