|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Laxmi N. Bhuyan, Ravi Iyer, Hu-jun Wang, Akhilesh Kumar, "Impact of CC-NUMA Memory Management Policies on the Application Performance of Multistage Switching Networks," IEEE Transactions on Parallel and Distributed Systems, vol. 11, no. 3, pp. 230-246, March, 2000. | |||
| BibTex | x | ||
| @article{ 10.1109/71.841740, author = {Laxmi N. Bhuyan and Ravi Iyer and Hu-jun Wang and Akhilesh Kumar}, title = {Impact of CC-NUMA Memory Management Policies on the Application Performance of Multistage Switching Networks}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {11}, number = {3}, issn = {1045-9219}, year = {2000}, pages = {230-246}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.841740}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Parallel and Distributed Systems TI - Impact of CC-NUMA Memory Management Policies on the Application Performance of Multistage Switching Networks IS - 3 SN - 1045-9219 SP230 EP246 EPD - 230-246 A1 - Laxmi N. Bhuyan, A1 - Ravi Iyer, A1 - Hu-jun Wang, A1 - Akhilesh Kumar, PY - 2000 KW - Memory management KW - switch design KW - wormhole routing KW - execution-driven simulation KW - scientific applications KW - shared-memory multiprocessor. VL - 11 JA - IEEE Transactions on Parallel and Distributed Systems ER - | |||
Abstract—In this paper, the impact of memory management policies and switch design alternatives on the application performance of cache-coherent nonuniform memory access (CC-NUMA) multiprocessors is studied in detail. Memory management plays an important role in determining the performance of NUMA multiprocessors by dictating the placement of data among the distributed memory modules. We analyze memory traces of several scientific applications for three different memory management techniques, namely buddy, round-robin, and first-touch policies, and compare their memory system performance. Interconnection network switch designs that consider virtual channels and varying number of input buffers per switch are presented. Our performance evaluation is based on execution-driven simulation methodology to capture the dynamic changes in the network traffic during execution of the applications. It is shown that the use of cut-through switching with buffers and virtual channels can improve the average message latency tremendously. However, the choice of memory management policy affects the amount of network traffic and the network access pattern. Thus, we vary the memory management policy and confirm the performance benefits of improved switch designs. Results of sensitivity studies by varying switch design parameters, cache block size, and memory page size are also presented. We find that a combination of first-touch memory management policy and a switch design with virtual channels and increased buffer space can reduce the average message latency by as high as 70 percent.
[1] L.N. Bhuyan,Q. Yang, and D.P. Agrawal,"Performance of Multiprocessor Interconnection Networks," Computer, vol. 22, no. 2, pp. 25-37, Feb. 1989.
[2] L.N. Bhuyan, R.R. Iyer, T. Askar, A.K. Nanda, and M. Kumar, “Performance of Multistage Bus Networks for a Distributed Shared Memory Multiprocessor,” IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 1, pp. 82-95, Jan. 1997.
[3] E.A. Brewer, C.N. Dellarocas, A. Colbrook, and W.E. Weihl, "PROTEUS: A High-Performance Parallel Architecture Simulator," technical report, Massachusetts Inst. of Tech nology, Sept. 1992.
[4] J. Carbonaro and F. Verhoorn, “Cavallino: The Teraflops Router and NIC,” Proc. Symp. High Performance Interconnects (Hot Interconnects 4), Aug. 1996.
[5] L.M. Censier and P. Feautrier, “A New Solution to Coherence Problems in Multicache Systems,” IEEE Trans. Computers, vol. 27, no. 12, pp. 1,112-1,118, Dec. 1978.
[6] D. Chaiken et al., “Directory-Based Cache Coherence in Large Scale Multiprocessors,” Computer, vol. 23, no. 6, pp. 49-58, June 1990.
[7] M. Galles, “Scalable Pipelined Interconnect for Distributed Endpoint Routing: The SGI SPIDER Chip,” Proc. Symp. High Performance Interconnects (Hot Interconnects 4), Aug. 1996.
[8] A. Kumar and L.N. Bhuyan, “Evaluating Virtual Channels for Cache Coherent Shared Memory Multiprocessors,” ACM Int'l Conf. Supercomputing, May 1996.
[9] R.P. Larowe and C.S. Ellis, “Experimental Comparisons of Memory Management Policies for NUMA Multiprocessors,” ACM Trans. Computer Systems, vol. 9, no. 4, pp. 319-363, Nov. 1991.
[10] D. Lenoski et al., “The DASH prototype: Logic overhead and performance,” IEEE Trans. on Parallel and Distributed Systems, vol. 4, no. 1, 1993, pp. 41-61.
[11] D. J. Lilja,“Cache coherence in large-scale shared memory multiprocessors: Issues and comparisons,”ACM Comput. Surv., vol. 25, no. 3, pp. 303–338, Sept. 1993.
[12] J.M. Mellor-Crummey, M.L. Scott, “Synchronization without Contention,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 269-278, Apr. 1991.
[13] J. Ramanathan and L.M. Ni, "Critical Factors in NUMA Memory Management," Proc. 11th Int'l Conf. Distributed Computing Systems, IEEE Computer Society Press, Los Alamitos, Calif., 1991, pp. xvii-650.
[14] C. Scheurich and M. Dubois, “Dynamic Page Migration in Multiprocessors with Distributed Global Memory,” IEEE Trans. Computers, vol. 38, no. 8, Aug. 1989.
[15] J.P. Singh, W.D. Weber, and A. Gupta, "SPLASH: Stanford Parallel Applications for Shared Memory," Proc. 19th Annual Int'l Symp. Computer Architecture, IEEE CS Press, Los Alamitos, Calif., May 1992, pp. 5-14.
[16] J. Torrellas and Z. Zheng, “The Performance of the Cedar Multistage Switching Network,” IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 4, pp. 321-336, Apr. 1997.
[17] A. Vaidya, A. Sivasubramaniam, and C. Das, “Performance Benefits of Virtual Channels and Adaptive Routing: An Application-Driven Study,” Proc. 11th Int'l Conf. Supercomputing, July 1997.
[18] B. Verghese, S. Devine, A. Gupta, and M. Rosenblum, "Operating System Support for Improving Data Locality on cc-NUMA Compute Servers," Proc. Seventh Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 279-289,Cambridge, Mass., Oct. 1996.

