This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Exploring Virtual Network Selection Algorithms in DSM Cache Coherence Protocols
August 2004 (vol. 15 no. 8)
pp. 699-712

Abstract—Distributed shared memory (DSM) multiprocessors typically require disjoint networks for deadlock-free execution of cache coherence protocols. This is normally achieved by implementing virtual networks with the help of virtual channels or virtual lanes multiplexed on a single physical network. To keep the coherence protocol simple, messages are usually assigned to virtual lanes in a predefined static manner based on a cycle-free lane assignment dependence graph. However, this static split of virtual networks (such as request and reply networks) may lead to underutilization of certain virtual networks while saturating the other networks. In this paper, we explore different static and dynamic schemes to select the virtual lanes for outgoing messages and mix the load among them without restricting any particular type of message to be carried only by a particular virtual network. We achieve this by exposing the selection algorithms to the coherence protocol itself, so that it can inject messages into selected virtual lanes based on some local information, and still enjoy deadlock-freedom. Our execution-driven simulation on five applications from the SPLASH-2 suite shows that as the system scales, the virtual network selection algorithms play an important role. For 128-node systems, our dynamic selection algorithm speeds up parallel execution by as much as 22 percent over an optimized baseline system running a modified SGI Origin 2000 protocol. We also explore how network latency, the number of message buffers per virtual lane, and the depth of network interface output queues affect the relative performance of various virtual lane selection algorithms.

[1] L.A. Barroso et al., "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing," Proc. 27th ACM Int'l Symp. Computer Architecture, ACM Press, 2000, pp. 282-293.
[2] M. Chaudhuri et al., Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation IEEE Trans. Computers, vol. 52, no. 7, pp. 862-880, July 2003.
[3] M. Chaudhuri and M. Heinrich, The Impact of Negative Acknowledgments in Shared Memory Scientific Applications IEEE Trans. Parallel and Distributed Systems, vol. 15, no. 2, pp. 134-150, Feb. 2004.
[4] W.J. Dally, "Virtual-Channel Flow Control," IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 2, pp. 194-205, Mar. 1992.
[5] W.J. Dally and C. Seitz, Deadlock-Free Message Routing in Multiprocessor Interconnection Networks IEEE Trans. Computers, vol. 36, no. 5, pp. 547-553, May 1987.
[6] M. Galles, “Spider: A High Speed Network Interconnect” IEEE Micro, vol. 17, no. 1, pp. 34–39 Jan.-Feb. 1997.
[7] K. Gharachorloo et al., Architecture and Design of AlphaServer GS320 Proc. Ninth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 13-24, Nov. 2000.
[8] J. Gibson et al., FLASH vs. (Simulated) FLASH: Closing the Simulation Loop Proc. Ninth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 49-58, Nov. 2000.
[9] A. Gupta, W.-D. Weber, and T. Mowry, Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes Proc. 1990 Int'l Conf. Parallel Processing, pp. 312-321, Aug. 1990.
[10] M. Heinrich et al., The Performance Impact of Flexibility in the Stanford FLASH Multiprocessor Proc. Sixth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 274-285, Oct. 1994.
[11] M. Heinrich, The Performance and Scalability of Distributed Shared Memory Cache Coherence Protocols PhD Dissertation, Stanford Univ., Oct. 1998.
[12] M. Heinrich et al., A Quantitatitve Analysis of the Performance and Scalability of Distributed Shared Memory Cache Coherence Protocols IEEE Trans. Computers, vol. 48, no. 2, pp. 205-217, (special issue on cache memory and related problems), Feb. 1999.
[13] M. Heinrich and M. Chaudhuri, Ocean Warning: Avoid Drowning ACM SIGARCH Computer Architecture News, vol. 31, no. 3, pp. 30-32, June 2003.
[14] A. Kumar and L.N. Bhuyan, Evaluating Virtual Channels for Cache-Coherent Shared-Memory Multiprocessors Proc. 10th ACM Int'l Conf. Supercomputing, pp. 253-260, May 1996.
[15] M. Heinrich et al. “The Stanford FLASH Multiprocessor,” Proc. 21th Int'l Symp. Computer Architecture, pp. 302-313, April 1994.
[16] J. Laudon and D. Lenoski, “The SGI Origin: A CC-NUMA Highly Scalable Server,” Proc. 24th Ann. Int'l Symp. Computer Architecture (ISCA '97), May 1997.
[17] D. Lenoski et al., "The directory-based cache coherence protocol for the dash multiprocessor," Proc. 17th Int'l Symp. Computer Architecture,Los Alamitos, Calif., pp. 148-159, 1990.
[18] D. Lenoski et al., “The Stanford DASH Multiprocessor,” Computer, pp. 63-79, Mar. 1992.
[19] T.D. Lovett and R.M. Clapp, STiNG: A CC-NUMA Computer System for the Commercial Marketplace Proc. 23rd Int'l Symp. Computer Architecture, pp. 308-317, May 1996.
[20] J.F. Martínez, J. Torrellas, and J. Duato, Improving the Performance of Bristled CC-NUMA Systems Using Virtual Channels and Adaptivity Proc. 13th ACM Int'l Conf. Supercomputing, pp. 202-209, June 1999.
[21] S.S. Mukherjee et al., "The Alpha 21364 Network Architecture," IEEE Micro, vol. 22, No. 1, Jan.-Feb. 2002, pp. 26-35.
[22] S.S. Mukherjee et al., A Comparative Study of Arbitration Algorithms for the Alpha 21364 Pipelined Router Proc. 10th Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 223-234, Oct. 2002.
[23] A. Nowatzyk et al., The S3.mp Scalable Shared Memory Multiprocessor Proc. 24th Int'l Conf. Parallel Processing, pp. 1-10, Aug. 1995.
[24] S.L. Scott and G.M. Thorson, The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus Proc. Conf. Hot Interconnects 4, Aug. 1996.
[25] A.S. Vaidya, A. Sivasubramaniam, and C.R. Das, Performance Benefits of Virtual Channels and Adaptive Routing: An Application-Driven Study Proc. 11th ACM Int'l Conf. Supercomputing, pp. 140-147, July 1997.
[26] S. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta, “The SPLASH-2 Programs: Characterization and Methodological Considerations,” Proc. Int'l Symp. Computer Architecture, pp. 24-36, June 1995.

Index Terms:
Distributed shared memory, cache coherence protocol, virtual network, deadlock-freedom.
Citation:
Mainak Chaudhuri, Mark Heinrich, "Exploring Virtual Network Selection Algorithms in DSM Cache Coherence Protocols," IEEE Transactions on Parallel and Distributed Systems, vol. 15, no. 8, pp. 699-712, Aug. 2004, doi:10.1109/TPDS.2004.35
Usage of this product signifies your acceptance of the Terms of Use.