This Article 
 Bibliographic References 
 Add to: 
Per-Node Multithreading and Remote Latency
April 1998 (vol. 47 no. 4)
pp. 414-426

Abstract—This paper evaluates the use of per-node multithreading to hide remote memory and synchronization latencies in software DSMs. As with hardware systems, multithreading in software systems can be used to reduce the costs of remote requests by running other threads when the current thread blocks. We added multithreading to the CVM software DSM and evaluated its impact on the performance of a suite of common shared memory programs. Multithreading resulted in speed improvements of at least 20 percent in two of the applications, and better than 15 percent for several other applications. However, we also found that good performance cannot always be achieved transparently for nontrivial applications. Also, the characteristics of the underlying DSM protocol can have a large effect on multithreading's utility.

[1] P. Keleher, “The Relative Importance of Concurrent Writers and Weak Consistency Models,” Proc. 16th Int'l Conf. Distributed Computing Systems, May 1996.
[2] A. Agarwal et al., “The MIT Alewife Machine: Architecture and Performance,” Proc. Int'l Symp. Computer Architecture, pp. 2-13, June 1995.
[3] T. Mowry and A. Gupta, "Tolerating Latency through Software-Controlled Prefetching in Scalable Shared- Memory Multiprocessors," J. Parallel and Distributed. Computing, vol. 12, pp. 87-106, June 1991.
[4] R. Wilson, R. French, C. Wilson, S. Amarasinghe, J. Anderson, S. Tjiang, S. Liao, C. Tseng, M. Hall, M. Lam, and J. Hennessy, "SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers," ACM SIGPLAN Notices, vol. 29, no. 12, pp. 31-37, Dec 1994.
[5] C. Amza, A.L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel, “TreadMarks: Shared Memory Computing on Networks of Workstations,” Computer, vol. 29, no. 2, Feb. 1996.
[6] K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, “Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors,” Proc. 17th Ann. Int'l Symp. Computer Architecture, 1990.
[7] P. Keleher, A.L. Cox, and W. Zwaenepoel, “Lazy Release Consistency for Software Distributed Shared Memory,” Proc. 19th Ann. Int'l Symp. Computer Architecture, pp. 13-21, May 1992.
[8] M.A. Blumrich et al., "Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer," Proc. 21st Int'l Symp. Computer Architecture, Apr. 1994, pp. 142-153.
[9] K. Thitikamol and P. Keleher, "Multi-Threading and Remote Latency in Software DSMs," Proc. 17th Int'l Conf. Distributed Computing Systems, May 1997.
[10] A. Itzkovitz, A. Schuster, and L. Wolfovich, "Thread Migration and Its Applications in Distributed Shared Memory Systems," Technion IIT LPCR #9603, July 1996.
[11] V.W. Freeh, D.K. Lowenthal, and G.R. Andrews, "Distributed Filaments: Efficient Fine-Grain Parallelism on a Cluster of Workstations," Proc. First Symp. Operating Systems Design and Implementation,Monterey, Calif., Nov. 1994.
[12] J. Philbin, J. Edler, O. Anshus, C. Douglas,, and K. Li, “Thread Scheduling for Cache Locality,” Proc. Architectural Support for Programming Languages and Operating Systems, pp. 60-71, 1996.
[13] V. Sunderam, “PVM: A Framework for Parallel Distributed Computing,” Concurrency: Practice and Experience, vol. 2, no. 4, pp. 315–339, , 1990.
[14] "MPI: A Message-Passing Interface," 1994.
[15] N. Boden et al., "Myrinet: A Gigabit-per-Second Local Area Network," IEEE Micro, Feb. 1995, pp. 29-36.
[16] R.B. Gillett, "Memory Channel Network for PCI," IEEE Micro, vol. 16, no. 1, pp. 12-18, Feb. 1996.

Index Terms:
Multithreading, DSM, latency toleration
Kritchalach Thitikamol, Peter Keleher, "Per-Node Multithreading and Remote Latency," IEEE Transactions on Computers, vol. 47, no. 4, pp. 414-426, April 1998, doi:10.1109/12.675711
Usage of this product signifies your acceptance of the Terms of Use.