The Community for Technology Leaders
High Performance Computing and Grid in Asia Pacific Region, International Conference on (2004)
Omiya Sonic City, Tokyo, Japan
July 20, 2004 to July 22, 2004
ISBN: 0-7695-2138-X
pp: 98-103
Dan Meng , Chinese Academy of Sciences, Beijing, P.R. China
Xiaocheng Zhou , Chinese Academy of Sciences, Beijing, P.R. China
Jie Ma , Chinese Academy of Sciences, Beijing, P.R. China
Zhigang Huo , Chinese Academy of Sciences, Beijing, P.R. China
ABSTRACT
As CLUMPS become the main stream of clusters and the number of nodes in a cluster increases, it requires enhancing the bandwidth performance and availability of the communication system used in clusters. Parallel communication based on multiple system area networks (SANs) can fulfill the requirements. This paper introduces the parallel communication protocol used in BCL-4, which is a high efficient communication system used in DAWNING-4000A, a large-scale LINUX cluster. It dispatches small messages and sub-messages stripped from large messages into multiple SANs and maintains the communication semantics as before. The parallel communication process is transparent to both users and the control program on network interface card (NIC). It also provides an efficient load balance mechanism. Using the parallel communication protocol, BCL-4 provides many key features, such as multiple throughput, high availability, and backward compatibility. The experimental results show that the peak bandwidth of BCL-4 over two Myrinet is 494.7MB/s, which is almost twice of that over one, and that there is only 0.02us overhead of short message at the same time.
INDEX TERMS
null
CITATION
Dan Meng, Xiaocheng Zhou, Jie Ma, Zhigang Huo, "The Parallel Communication Protocol in BCL-4", High Performance Computing and Grid in Asia Pacific Region, International Conference on, vol. 00, no. , pp. 98-103, 2004, doi:10.1109/HPCASIA.2004.1324022
92 ms
(Ver 3.3 (11022016))