Issue No. 02 - February (2011 vol. 22)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.73
Michihiro Koibuchi , National Institute of Informatics (NII), Tokyo
Tomohiro Otsuka , Keio University, Yokohama
Tomohiro Kudoh , National Institute of Advanced Industrial Science and Technology, Tsukuba
Hideharu Amano , Keio University, Yokohama
Ethernet has been used for connecting hosts in PC clusters, besides its use in local area networks. Although a layer-2 Ethernet topology is limited to a tree structure because of the need to avoid broadcast storms and deadlocks of frames, various deadlock-free routing algorithms on topologies that include loops suitable for parallel processing can be employed by the application of IEEE 802.1Q VLAN technology. However, the MPI communication libraries used in current PC clusters do not always support tagged VLAN technology; therefore, at present, the design of VLAN-based Ethernet cannot be applied to such PC clusters. In this study, we propose a switch-tagged routing methodology in order to implement various deadlock-free routing algorithms on such PC clusters by using at most the same number of VLANs as the degree of a switch. Since the MPI communication libraries do not need to perform VLAN operations, the proposed methodology has advantages in both simple host configuration and high portability. In addition, when it is used with on/off and multispeed link regulation, the power consumption of Ethernet switches can be reduced. Evaluation results using NAS parallel benchmarks showed that the performance of the topologies that include loops using the proposed methodology was comparable to that of an ideal one-switch (full crossbar) network, and the torus topology in particular had up to a 27 percent performance improvement compared with a tree topology with link aggregation.
Ethernet, routing, deadlock avoidance, interconnection networks, PC clusters.
H. Amano, T. Otsuka, T. Kudoh and M. Koibuchi, "A Switch-Tagged Routing Methodology for PC Clusters with VLAN Ethernet," in IEEE Transactions on Parallel & Distributed Systems, vol. 22, no. , pp. 217-230, 2010.