2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) (2018)
Washington, DC, USA
May 1, 2018 to May 4, 2018
The recent interconnect topology designs for High Performance Computing (HPC) systems have followed two directions, one characterized by low diameter and the other by high path diversity. The low diameter design focuses on building large networks with small diameters, guaranteeing one short path between each pair of nodes. Examples include Slim Fly and Dragonfly. The high path diversity design takes into account not only other topological metrics such as diameter but also path diversity between pairs of nodes. Examples include fat-tree, Random Regular Graph (RRG) and Generalized De Bruin Graph (GDBG). Topologies designed from these two approaches have distinct features and require very different routing schemes to exploit the network capacity. In this work, we study the performance-related topological features of representative topologies of the two design approaches, including Slim Fly, Dragonfly, RRG, and GDBG, and compare HPC application performance on these topologies with a set of routing schemes. The study uncovers new knowledge about the topologies designed by these two approaches. Findings of the study include (1) the load balance routing technique designed for low diameter topologies, known as the Universal Globally Adaptive Load-balanced routing (UGAL), can be effectively adapted for the high path diversity topologies, and (2) high path diversity topologies in general achieve higher performance than low diameter topologies for networks built by a similar number of the same type of switches.
graph theory, multiprocessor interconnection networks, network routing, network theory (graphs), network topology, parallel processing, trees (mathematics)
M. A. Mollah, P. Faizian, M. S. Rahman, X. Yuan, S. Pakin and M. Lang, "A Comparative Study of Topology Design Approaches for HPC Interconnects," 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), Washington, DC, USA, 2018, pp. 392-401.