|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
2006 International Conference on Parallel Processing Workshops (ICPPW'06)
Using Overdecomposition to Overlap Communication Latencies with Computation and Take Advantage of SMT Processors
Columbus, Ohio
August 14-August 18
ISBN: 0-7695-2637-3
| ASCII Text | x | ||
| Lars Ailo Bongo, Brian Vinter, Otto J. Anshus, Tore Larsen, John Markus Bj?rndalen, "Using Overdecomposition to Overlap Communication Latencies with Computation and Take Advantage of SMT Processors," 2012 41st International Conference on Parallel Processing Workshops, pp. 239-247, 2006 International Conference on Parallel Processing Workshops (ICPPW'06), 2006. | |||
| BibTex | x | ||
| @article{ 10.1109/ICPPW.2006.77, author = {Lars Ailo Bongo and Brian Vinter and Otto J. Anshus and Tore Larsen and John Markus Bj?rndalen}, title = {Using Overdecomposition to Overlap Communication Latencies with Computation and Take Advantage of SMT Processors}, journal ={2012 41st International Conference on Parallel Processing Workshops}, volume = {0}, year = {2006}, isbn = {0-7695-2637-3}, pages = {239-247}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICPPW.2006.77}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - 2012 41st International Conference on Parallel Processing Workshops TI - Using Overdecomposition to Overlap Communication Latencies with Computation and Take Advantage of SMT Processors SN - 0-7695-2637-3 SP239 EP247 A1 - Lars Ailo Bongo, A1 - Brian Vinter, A1 - Otto J. Anshus, A1 - Tore Larsen, A1 - John Markus Bj?rndalen, PY - 2006 KW - null VL - 0 JA - 2012 41st International Conference on Parallel Processing Workshops ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICPPW.2006.77
Parallel programs running on clusters are typically decomposed and mapped to run with one thread per processor each working on its disjoint subset of the data. We evaluate performance improvements and limitations for a microbenchmark and the NAS benchmarks, by using overdecomposition to map multiple threads to each processor to overlap computation with communication. The experiment platform is a cluster with Pentium 4 symmetric multithreading (SMT) processor nodes interconnected through Gigabit Ethernet. Micro-benchmark results demonstrate execution time improvements up to 1.8. However, for the NAS benchmarks overdecomposition and SMT provides only slight performance gains, and sometimes significant performance loss. We evaluated improvement and limitation sensitivity to problem size, communication structure and whether SMT is enabled or not. We found that performance improvements are limited by: applications having communication dependencies that limit thread-level parallelism, increase in cache misses, or increased systems activity. Our study contributes a better understanding of these limitations.
Citation:
Lars Ailo Bongo, Brian Vinter, Otto J. Anshus, Tore Larsen, John Markus Bj?rndalen, "Using Overdecomposition to Overlap Communication Latencies with Computation and Take Advantage of SMT Processors," icppw, pp.239-247, 2006 International Conference on Parallel Processing Workshops (ICPPW'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.
