2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems Resilience Challenges for Exascale Systems Chicago, Illinois October 07-October 09 ISBN: 978-0-7695-3839-6
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/DFT.2009.52
The combination of decreasing device reliability due to deep submicron scaling, increasing integration, and the size of future exascale high-performance computers and cloud datacenters pose significant challenges for system resilience. Furthermore, with power and cost being of critical importance, resilience must be provided efficiently and economically. Although providing resilience will require a range of approaches at all levels of the system stack, the final responsibility rests at the system level. In addition to highlighting challenges, this talk reviews and introduces promising system-level techniques such as configurable isolation, duplication caching, multicore DIMMs, CoVeRT, and 3D checkpointing.
Index Terms:
Resilience, exascale systems, isolation, duplication, checkpointing
Citation:
Norman Paul Jouppi, "Resilience Challenges for Exascale Systems," dft, pp.379, 2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, 2009 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||