This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer
Dec. 2012 (vol. 23 no. 12)
pp. 2266-2279
Oreste Villa, Pacific Northwest National Laboratory, Richland
Antonino Tumeo, Pacific Northwest National Laboratory, Richland
Simone Secchi, Pacific Northwest National Laboratory, Richland
Joseph B. Manzano, Pacific Northwest National Laboratory, Richland
Irregular applications, such as data mining or graph-based computations, show unpredictable memory/network access patterns and control structures. Massively multithreaded architectures with large processor counts, like the Cray MTA-1, MTA-2, and XMT, appear to address irregular application requirements better than commodity clusters. However, the research on massively multithreaded systems is currently limited by the lack of adequate architectural simulation infrastructures due to issues such as size of the machines, memory footprint, simulation speed, accuracy, and customization. At the same time, Shared Memory MultiProcessors (SMPs) with multicore processors have become an attractive platform to simulate large-scale systems. This paper introduces a cycle-level simulator of the massively multithreaded Cray XMT supercomputer. The simulator runs unmodified XMT applications. We discuss how we tackled the challenges posed by its development, detailing the techniques implemented to obtain high-simulation speed while maintaining a high accuracy. By mapping XMT processors (ThreadStorm with 128 hardware threads) to host computing cores, the simulation speed remains constant as the number of simulated processors increases, up to the number of available host cores. The simulator supports zero-overhead switching among different accuracy levels at runtime and includes a parametric network and memory model that takes into account contention and hot spotting. On a modern 48-core SMP host, the proposed infrastructure simulates a large set of irregular applications 500 to 2,000 times slower than real time when compared to a 128-processor XMT, with an accuracy error under 10 percent. Emulation is only from 25 to 200 times slower than real time. The paper also presents a case study, where the simulation infrastructure is used to identify bottlenecks in the current XMT architecture and to estimate the performance scaling of a possible multicore design with next generation memory and network interconnect.
Index Terms:
Instruction sets,Multiprocessing systems,Multithreading,Synchronization,Multicore processing,Computational modeling,simulation of multiple-processor systems,Modeling of computer architecture,multithreaded processors,system architectures,integration and modeling,measurement,evaluation,modeling
Citation:
Oreste Villa, Antonino Tumeo, Simone Secchi, Joseph B. Manzano, "Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer," IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 12, pp. 2266-2279, Dec. 2012, doi:10.1109/TPDS.2012.70
Usage of this product signifies your acceptance of the Terms of Use.