2011 IEEE International Conference on Cluster Computing (2011)
Austin, Texas USA
Sept. 26, 2011 to Sept. 30, 2011
In this work we provide an early performance analysis of the communication network in a small-scale POWER7-IH processing system from IBM. Using a set of communication micro-benchmarks we quantify the achievable bandwidth of the communication links available in the system that differ in their peak performance characteristics. We also identify the bottlenecks within the communication network and show that the bandwidth a single node can inject into the network is considerably less than the bandwidth available to the IBM hub chip, that acts as a NIC to the node as well as being an integral part of the P7-IH network. Using a communication pattern that is representative of activities in many scientific applications that have regular communication patterns, we show how the default task-to-core assignment on the P7-IH achieves sub-optimal performance in most cases. We also show that when using a diagonal-cyclic assignment, as developed in this work that takes into account the network topology as well as routing strategy, the communication performance can be improved by up to 75%. We expect even greater improvements in the communication performance on larger P7-IH systems.
High Performance Computing, Performance Analysis, Performance Optimization, Task mapping
K. J. Barker and D. J. Kerbyson, "Analyzing the Performance Bottlenecks of the POWER7-IH Network," 2011 IEEE International Conference on Cluster Computing(CLUSTER), Austin, Texas USA, 2011, pp. 244-252.