|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
GPU Accelerating for Rapid Multi-core Cache Simulation
Anchorage, Alaska USA
May 16-May 20
ISBN: 978-0-7695-4577-6
| ASCII Text | x | ||
| Wan Han, Long Xiang, Gao Xiaopeng, Li Yi, "GPU Accelerating for Rapid Multi-core Cache Simulation," 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, pp. 1387-1396, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, 2011. | |||
| BibTex | x | ||
| @article{ 10.1109/IPDPS.2011.295, author = {Wan Han and Long Xiang and Gao Xiaopeng and Li Yi}, title = {GPU Accelerating for Rapid Multi-core Cache Simulation}, journal ={2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum}, volume = {0}, year = {2011}, issn = {1530-2075}, pages = {1387-1396}, doi = {http://doi.ieeecomputersociety.org/10.1109/IPDPS.2011.295}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum TI - GPU Accelerating for Rapid Multi-core Cache Simulation SN - 1530-2075 SP1387 EP1396 A1 - Wan Han, A1 - Long Xiang, A1 - Gao Xiaopeng, A1 - Li Yi, PY - 2011 VL - 0 JA - 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum ER - | |||
To find the best memory system for emerging workloads, traces are obtained during application's execution, then caches with different configurations are simulated using these traces. Since program traces can be several gigabytes, simulation of cache performance is a time consuming process. Compute unified device architecture (CUDA) is a software development platform which enables programmers to accelerate the general-purpose applications on the graphics processing unit (GPU). This paper presents a real time multi-core cache simulator, which was built based on the Pin tool to get the memory reference, and fast method for multi-core cache simulation using the CUDA-enabled GPU. The proposed method is accelerated by the following techniques: execution parallelism exploration, memory latency hiding, a novel trace compression methodology. We describe how these techniques can be incorporated into CUDA code. Experimental results show that the hybrid parallel method of time-partitioning combines with set-partitioning presented here is 11.10x speedup compared to the CPU serial simulation algorithm. The present simulator can characterize cache performance of single-threaded or multi-threaded workloads at the speeds of 6-15 MIPS. It can simulates 6 cache configurations within one single pass at this speeds compared to CMP$im, which can only simulate one cache configuration each simulation pass at the speeds of 4-10 MIPS.
Citation:
Wan Han, Long Xiang, Gao Xiaopeng, Li Yi, "GPU Accelerating for Rapid Multi-core Cache Simulation," ipdpsw, pp.1387-1396, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, 2011
Usage of this product signifies your acceptance of the Terms of Use.
