Cluster Computing and the Grid, IEEE International Symposium on (2011)
Newport Beach, California USA
May 23, 2011 to May 26, 2011
Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved towards the exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoiding network hot-spots and improving scalability. Parallel simulation is a promising approach, which has been extensively used to model the performance of such large-scale machines. One of the most critical factors in coping with the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class. In this paper, we discuss the development of a network contention model for a full-system XMT simulator. We start by measuring the effects of network contention on a 128-processorXMT machine, we then investigate the trade-off that exists between simulation accuracy and speed, comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the actual machine, using three datasets that generate noticeably different contention patterns. Results prove that simulator accuracy in execution time remains within 10% of the real machine. We also show that the slowdown due to contention modeling is limited to 20%, when simulating a small number of processors, and becomes negligible for simulations with higher processor counts.
Network Modeling, Multi-threading, Cray XMT supercomputer, Parallel Simulation, Irregular Applications
A. Tumeo, O. Villa and S. Secchi, "Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT," Cluster Computing and the Grid, IEEE International Symposium on(CCGRID), Newport Beach, California USA, 2011, pp. 275-284.