High Performance Computing and Grid in Asia Pacific Region, International Conference on (2004)
Omiya Sonic City, Tokyo, Japan
July 20, 2004 to July 22, 2004
Xiaocheng Zhou , Chinese Academy of Sciences, Beijing, P.R. China
Zhigang Huo , Chinese Academy of Sciences, Beijing, P.R. China
Jie Ma , Chinese Academy of Sciences, Beijing, P.R. China
Dan Meng , Chinese Academy of Sciences, Beijing, P.R. China
As CLUMPS become the main stream of clusters and the number of nodes in a cluster increases, it requires enhancing the bandwidth performance and availability of the communication system used in clusters. Parallel communication based on multiple system area networks (SANs) can fulfill the requirements. This paper introduces the parallel communication protocol used in BCL-4, which is a high efficient communication system used in DAWNING-4000A, a large-scale LINUX cluster. It dispatches small messages and sub-messages stripped from large messages into multiple SANs and maintains the communication semantics as before. The parallel communication process is transparent to both users and the control program on network interface card (NIC). It also provides an efficient load balance mechanism. Using the parallel communication protocol, BCL-4 provides many key features, such as multiple throughput, high availability, and backward compatibility. The experimental results show that the peak bandwidth of BCL-4 over two Myrinet is 494.7MB/s, which is almost twice of that over one, and that there is only 0.02us overhead of short message at the same time.
D. Meng, X. Zhou, J. Ma and Z. Huo, "The Parallel Communication Protocol in BCL-4," High Performance Computing and Grid in Asia Pacific Region, International Conference on(HPCASIA), Omiya Sonic City, Tokyo, Japan, 2004, pp. 98-103.