The International Conference on Dependable Systems and Networks (DSN'01)
Fault-Tolerant High-Performance Matrix Multiplication: Theory and Practice
Goteborg, Sweden
July 01-July 04
ISBN: 0-7695-1101-5
Abstract: In this paper, we extend the theory and practice regarding algorithmic fault-tolerant matrix-matrix multiplication, C = AB, in a number of ways. First, we propose low-overhead methods for detecting errors introduced not only in C but also in A and/or B. Second, we show that, theoretically, these methods will detect all errors as long as only one entry is corrupted. Third, we propose a low-overhead roll-back approach to correct errors once detected. Finally, we give a high-performance implementation of matrix-matrix multiplication that incorporates these error detection and correction methods. Empirical results demonstrate that these methods work well in practice while imposing an acceptable level of overhead relative to high-performance implementations without fault-tolerance.
Citation:
John A. Gunnels, Robert A. van de Geijn, Daniel S. Katz, Enrique S. Quintana-Ortí, "Fault-Tolerant High-Performance Matrix Multiplication: Theory and Practice," dsn, pp.0047, The International Conference on Dependable Systems and Networks (DSN'01), 2001
Usage of this product signifies your acceptance of the
Terms of Use.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||