The Community for Technology Leaders
RSS Icon
Issue No.09 - September (2009 vol.20)
pp: 1285-1298
Salvador Coll , Universidad Politecnica de Valencia, Valencia
Francisco J. Mora , Universidad Politecnica de Valencia, Valencia
Jose Duato , Universidad Politecnica de Valencia, Valencia
Fabrizio Petrini , IBM TJ Watson Research Center, Yorktown Heights
This article presents an efficient and scalable mechanism to overcome the limitations of collective communication in switched interconnection networks in the presence of faults. Considering that current trends in supercomputing are moving toward massively parallel computers, with many thousands of components, reliability becomes a challenge. In such scenario, fat-tree networks that provide hardware support for collective communication suffer from serious performance degradation due to the presence of, even, a single faulty node. This paper describes a new mechanism to provide high-performance collective communication in such situations. The feasibility of the proposed technique is formally demonstrated. We present the design of a new hardware-based routing algorithm for multicast, that is at the base of our proposal. The proposed mechanism is implemented and experimentally evaluated. Our experimental results show that hardware-based multicast trees provide an efficient and scalable solution for collective communication in fat-tree networks, significantly outperforming traditional solutions.
Multicast, data communications, interprocessor communications, network communication, network problems, trees.
Salvador Coll, Francisco J. Mora, Jose Duato, Fabrizio Petrini, "Efficient and Scalable Hardware-Based Multicast in Fat-Tree Networks", IEEE Transactions on Parallel & Distributed Systems, vol.20, no. 9, pp. 1285-1298, September 2009, doi:10.1109/TPDS.2008.228
[1] P.K. McKinley, Y. jia Tsai, and D.F. Robinson, “Collective Communication in Wormhole-Routed Massively Parallel Computers,” Computer, vol. 28, no. 12, pp. 39-50, 1995.
[2] D. Kerbyson, H. Alme, A. Hoisie, F. Petrini, H. Wasserman, and M. Gittings, “Predictive Performance and Scalability Modeling of a Large-Scale Application,” Proc. ACM/IEEE Conf. Supercomputing (SC '01), Nov. 2001.
[3] F. Petrini, D. Kerbyson, and S. Pakin, “The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q,” Proc. ACM/IEEE Conf. Supercomputing (SC '03), Nov. 2003.
[4] S.L. Scott, “Synchronization and Communication in the T3E Multiprocessor,” Proc. Seventh Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS '96), pp.26-36, 1996.
[5] The BlueGene/L Team, “An Overview of the BlueGene/L Supercomputer,” Proc. ACM/IEEE Conf. Supercomputing (SC '02), Nov. 2002.
[6] E. Frachtenberg, D. Feitelson, J. Fernández, and F. Petrini, “Parallel Job Scheduling under Dynamic Workloads,” Proc. Ninth Workshop Job Scheduling Strategies for Parallel Processing (JSSPP '03), in conjunction with HPDC12/GGF8, June 2003.
[7] E. Frachtenberg, F. Petrini, J. Fernandez, and S. Coll, “Scalable Resource Management in High Performance Computers,” Proc. Fourth IEEE Int'l Conf. Cluster Computing (CLUSTER '02), pp.305-314, 2002.
[8] E. Frachtenberg, F. Petrini, J. Fernandez, S. Pakin, and S. Coll, “Storm: Lightning-Fast Resource Management,” Proc. ACM/IEEE Conf. Supercomputing (SC '02), Nov. 2002.
[9] D.J. Kerbyson, A. Hoisie, and H.J. Wasserman, “Use of Predictive Performance Modeling during Large-Scale System Installation,” Proc. First Int'l Workshop Hardware/Software Support for Parallel and Distributed Scientific and Eng. Computing, Sept. 2002.
[10] F. Petrini, E. Frachtenberg, A. Hoisie, and S. Coll, “Performance Evaluation of the Quadrics Interconnection Network,” Cluster Computing, vol. 6, no. 2, pp. 125-142, 2003.
[11] F. Petrini, S. Coll, E. Frachtenberg, and A. Hoisie, “Hardware- and Software-Based Collective Communication on the Quadrics Network,” Proc. IEEE Int'l Symp. Network Computing and Applications (NCA '01), pp. 24-35, Oct. 2001.
[12] F. Petrini, S. Coll, J. Fernandez, and E. Frachtenberg, “Scalable Collective Communication on the ASCI Q Machine,” Proc. 11th Symp. High Performance Interconnects (HOTI '03), pp. 54-59, Aug. 2003.
[13] S. Coll, J. Duato, F. Mora, F. Petrini, and A. Hoisie, “Collective Communication Patterns on the Quadrics Network,” Performance Analysis and Grid Computing, V. Getov, M. Gerndt, A. Hoisie, A.Malony, and B. Miller, eds., chapter I, pp.93-107, Kluwer Academic, Sept. 2003.
[14] J. Fernández, E. Frachtenberg, F. Petrini, K. Davis, and J.C. Sancho, “Architectural Support for System Software on Large-Scale Clusters,” Proc. Int'l Conf. Parallel Processing (ICPP '04), Aug. 2004.
[15] J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks: An Engineering Approach. Morgan Kaufmann, Aug. 2002.
[16] F. Petrini and M. Vanneschi, “Performance Analysis of Wormhole Routed $k$ -Ary $n$ -Trees,” Int'l J. Foundations of Computer Science, vol. 9, pp. 157-177, June 1998.
[17] F. Petrini, W.-C. Feng, A. Hoisie, S. Coll, and E. Frachtenberg, “The Quadrics Network: High-Performance Clustering Technology,” IEEE Micro, vol. 22, pp. 46-57, Jan./Feb. 2002.
[18] C.M. Chiang and L.M. Ni, “Multi-Address Encoding for Multicast,” Proc. First Int'l Workshop Parallel Computer Routing and Comm. (PCRCW '94), pp. 146-160, 1994.
[19] R. Sivaram, D.K. Panda, and C.B. Stunkel, “Efficient Broadcast and Multicast on Multistage Interconnnection Networks Using Multiport Encoding,” Proc. Eighth IEEE Symp. Parallel and Distributed Processing (SPDP '96), pp. 36-45, Oct. 1996.
[20] S. Coll, J. Duato, F. Petrini, and F.J. Mora, “Scalable Hardware-Based Multicast Trees,” Proc. ACM/IEEE Conf. Supercomputing (SC'03), p. 54, 2003.
[21] J. Zhou, X.-Y. Lin, and Y.-C. Chung, “Hardware Supported Multicast in Fat-Tree-Based Infiniband Networks,” J. Supercomputing, vol. 40, pp. 333-352, June 2007.
[22] J.Y.L. Park, H.-A. Choi, N. Nupairoj, and L.M. Ni, “Construction of Optimal Multicast Trees Based on the Parameterized Communication Model,” Proc. Int'l Conf. Parallel Processing (ICPP '96), pp.180-187, Aug. 1996.
[23] D. Culler, R.M. Karp, D.A. Patterson, A. Sahay, K.E. Schauser, E. Santos, R. Subramonian, and T. von Eicken, “LogP: Towards a Realistic Model of Parallel Computation,” Proc. Fourth ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPOPP '93), pp. 1-12, May 1993.
29 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool