|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Aniruddha S. Vaidya, Anand Sivasubramaniam, Chita R. Das, "Impact of Virtual Channels and Adaptive Routing on Application Performance," IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 2, pp. 223-237, February, 2001. | |||
| BibTex | x | ||
| @article{ 10.1109/71.910875, author = {Aniruddha S. Vaidya and Anand Sivasubramaniam and Chita R. Das}, title = {Impact of Virtual Channels and Adaptive Routing on Application Performance}, journal ={IEEE Transactions on Parallel and Distributed Systems}, volume = {12}, number = {2}, issn = {1045-9219}, year = {2001}, pages = {223-237}, doi = {http://doi.ieeecomputersociety.org/10.1109/71.910875}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Parallel and Distributed Systems TI - Impact of Virtual Channels and Adaptive Routing on Application Performance IS - 2 SN - 1045-9219 SP223 EP237 EPD - 223-237 A1 - Aniruddha S. Vaidya, A1 - Anand Sivasubramaniam, A1 - Chita R. Das, PY - 2001 KW - Adaptive routing KW - architectural simulation KW - interconnection network KW - mesh network KW - performance evaluation KW - virtual channels. VL - 12 JA - IEEE Transactions on Parallel and Distributed Systems ER - | |||
Abstract—Research on multiprocessor interconnection networks has primarily focused on wormhole switching, virtual channel flow control, and routing algorithms to enhance their performance. The rationale behind this research is that by alleviating the network latency for high network loads, the overall system performance would improve. Many studies have used synthetic workloads to support this claim. However, such workloads may not necessarily capture the behavior of real applications. In this paper, we have used parallel applications for a closer examination of the network behavior. In particular, the performance benefit from enhancing a 2D mesh with virtual channels (VCs) and a fully adaptive routing algorithm is examined with a set of shared-memory and message passing applications. Execution time and average message latency of shared memory applications are measured using execution-driven simulation and by varying many architectural attributes that affect the network workload. The communication traces of message passing applications, collected on an IBM-SP2, are used to run a trace-driven simulation of the mesh architecture to obtain message latency. Simulation results show that VCs and adaptive routing can reduce the network latency to varying degrees depending on the application. However, these modest benefits do not translate to significant improvements in the overall execution time because the load on the network is not high enough to exploit the advantages of the network enhancements. Moreover, this benefit may be negated if the architectural enhancements increase the network cycle time. Rather, emphasis should be placed on improving the raw network bandwidth and faster network interfaces.
[1] W.J. Dally, "Virtual-Channel Flow Control," IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 2, pp. 194-205, Mar. 1992.
[2] P.T. Gaughan and S. Yalamanchili, “Adaptive Routing Protocols for Hypercube Interconnection Networks,” Computer, vol. 26, no. 5, pp. 12–23, May 1993.
[3] D.C. Burger and D.A. Wood, “Accuracy vs. Performance in Parallel Simulation of Interconnection Networks,” Proc. Int'l Symp. Parallel Processing, Apr. 1995.
[4] J. Duato, "A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 12, pp. 1,320-1,331, Dec. 1993.
[5] C.J. Glass and L.M. Ni, "The Turn Model for Adaptive Routing," Proc. 19th Int'l Symp. Computer Architecture, vol. 20, no. 2, pp. 278-287, May 1992.
[6] M.L. Fulgham and L. Snyder, “Performance of Chaos and Oblivous Routers under Non-Uniform Traffic,” Technical Report UW–CSE–93–06–01, Department of Computer Science and Eng., Univ. of Washington, Seattle, July 1994.
[7] W.J. Dally and H. Aoki, "Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 4, pp. 466-475, Apr. 1993.
[8] A.A. Chien and J.H. Kim, "Planar-Adaptive Routing: Low-Cost Adaptive Networks for Multiprocessors," J. ACM, vol. 42, no. 1, pp. 91-123, 1995.
[9] Y.M. Boura and C.R. Das, “Efficient Fully Adaptive Wormhole Routing inn-Dimensional Meshes,” Proc. 14th Int'l Conf. Distributed Computing Systems, pp. 589-596, May 1994.
[10] L. Gravano, G.D. Pifarré, P.E. Berman, and J.L.C. Sanz, “Adaptive Deadlock- and Livelock-Free Routing with All Minimal Paths in Torus Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 5, no. 12, pp. 1,233–1,251, Dec. 1994.
[11] R. Boppana and S. Chalasani, "A Comparison of Adaptive Wormhole Routing Algorithms," Proc. 20th Ann. Int'l Symp. Computer Architecture," pp. 351-360, 1993.
[12] S.L. Scott and G.M. Thorson, “The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus,” Proc. Symp. High Performance Interconnects (Hot Interconnects 4), pp. 147–156, Aug. 1996.
[13] J. Carbonaro, F. Verhoorn, “Cavallino: The Teraflops Router and NIC,” Proc. Symp. High Performance Interconnects (Hot Interconnects 4), pp. 157–160, Aug. 1996.
[14] M. Galles, “Scalable Pipelined Interconnect for Distributed Endpoint Routing : The SGI SPIDER Chip,” Proc. Symp. High Performance Interconnects (Hot Interconnects 4), pp. 141–146, Aug. 1996.
[15] D. Dai and D.K. Panda, “How Much Does Network Contention Affect Distributed Shared Memory Performance?,” Proc. Int'l Conf. Parallel Processing, pp. 454-461, Chicago, Aug. 1997.
[16] A. Kumar and L.N. Bhuyan, “Evaluating Virtual Channels for Cache Coherent Shared Memory Multiprocessors,” ACM Int'l Conf. Supercomputing, May 1996.
[17] K. Aoyama and A.A. Chien, “The Cost of Adaptivity and Virtual Lanes in a Wormhole Router,” J. VLSI Design, vol. 2, no. 4, pp. 315–333, 1995.
[18] S. Chodnekar et al., “Towards a Communication Characterization Methodology for Parallel Applications,” Proc. Int'l Symp. High Performance Computer Architecture, pp. 310-319, Feb. 1997.
[19] A. Sivasubramaniam, A. Singla, U. Ramachandran, and H. Venkateswaran, "An Approach to Scalability Study of Shared Memory Parallel Systems," Proc. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, ACM, May 1994.
[20] D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow, “The NAS Parallel Benchmarks 2.0,” Technical Report NAS-95-020, NASA Ames Research Center, Moffet Field, Calif., Dec. 1995.
[21] E.A. Brewer, C.N. Dellarocas, A. Colbrook, and W.E. Weihl, "PROTEUS: A High-Performance Parallel Architecture Simulator," technical report, Massachusetts Inst. of Tech nology, Sept. 1992.
[22] H. Davis, S.R. Goldschmidt, and J.L. Hennessy, “Multiprocessor Simulation and Tracing Using Tango,” Proc. 1991 Int'l Conf. Parallel Processing, pp. II 99–107, 1991.
[23] D. Culler, J.P. Singh, and A. Gupta, Parallel Computer Architecture: A Hardware/Software Approach, Morgan Kaufmann, San Francisco, 1998.
[24] J. Duato, S. Yalamanchili, and L.M. Ni, Interconnection Networks: An Engineering Approach. Los Alamitos, Calif.: IEEE CS Press, 1997.
[25] T. Agerwala, J. Martin, J. Mirza, D. Sadler, D. Dias, and M. Snir, “SP2 System Architecture,” IBM Systems J., vol. 34, no. 2,pp. 153–184, 1995.
[26] H.D. Schwetman, “Introduction to Process-Oriented Simulation and CSIM,” Proc. Winter Simulation Conf., pp. 154-157, Dec. 1990.
[27] D. Bailey, “The NAS Parallel Benchmarks,” Int'l J. Supercomputer Applications, vol. 5, no. 3, pp. 63–73, 1991.
[28] J.P. Singh, W.-D. Weber,, and A. Gupta, “Splash: Stanford Parallel Applications for Shared Memory,” Technical Report CSL-TR-91-469, Stanford Univ., Apr. 1991.
[29] R.J. Anderson and J.C. Setubal, “On the Parallel Implementation of Goldberg's Maximum Flow Algorithm,” Proc. Fourth Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 168–177, June 1992.
[30] S. Ramany and D. Eager, "The Interaction Between Virtual Channel Flow Control and Adaptive Routing in Wormhole Networks," Proc. Int'l Conf. Supercomputing, pp. 136-145, July 1994.
[31] J. Rexford, W. Feng, J. Dolter, and K.G. Shin, "PP-MESS-SIM: A Flexible and Extensible Simulator for Evaluating Multicomputer Networks," IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 1, pp. 25-40, Jan. 1997.
[32] V. Karamcheti and A.A. Chien, "Software Overhead in Messaging Layers: Where Does the Time Go?" Proc. Sixth Symp. Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), ACM Press, New York, 1994, pp. 51-60; http://www-csag.cs.uiuc.edu/papers/asplos94.ps.
[33] S. Chittor and R. Enbody, "Performance Evaluation of Mesh-Connected Wormhole-Routed Networks for Interprocessor Communication in Multicomputers," Proc. Supercomputing '90, pp. 647-656,New York, Nov. 1990.
[34] J.-M. Hsu and P. Banerjee, “Performance Measurement and Trace Driven Simulation of Parallel CAD and Numeric Applications on a Hypercube Multicomputer,” IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 4, pp. 451–464, July 1992.
[35] R. Cypher, A. Ho, S. Konstantinidou, and P. Messina, "Architectural Requirements of Parallel Scientific Applications with Explicit Communication," Proc. 20th Ann. Int'l Symp. Computer Architecture, pp. 2-13, May 1993.
[36] D. Basak and D.K. Panda, “Alleviating Consumption Channel Bottleneck in Wormhole-Routed$k$-Ary$n$-Cube System,” IEEE Trans. Parallel and Distributed Systems, vol. 9, pp. 481–496, May 1998.

