loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
The International Conference on Dependable Systems and Networks (DSN'01)
Fault-Tolerant High-Performance Matrix Multiplication: Theory and Practice
Goteborg, Sweden
July 01-July 04
ISBN: 0-7695-1101-5
John A. Gunnels, The University of Texas at Austin
Robert A. van de Geijn, The University of Texas at Austin
Daniel S. Katz, California Institute of Technology
Enrique S. Quintana-Ortí, Universidad Jaume I
Abstract: In this paper, we extend the theory and practice regarding algorithmic fault-tolerant matrix-matrix multiplication, C = AB, in a number of ways. First, we propose low-overhead methods for detecting errors introduced not only in C but also in A and/or B. Second, we show that, theoretically, these methods will detect all errors as long as only one entry is corrupted. Third, we propose a low-overhead roll-back approach to correct errors once detected. Finally, we give a high-performance implementation of matrix-matrix multiplication that incorporates these error detection and correction methods. Empirical results demonstrate that these methods work well in practice while imposing an acceptable level of overhead relative to high-performance implementations without fault-tolerance.
Citation:
John A. Gunnels, Robert A. van de Geijn, Daniel S. Katz, Enrique S. Quintana-Ortí, "Fault-Tolerant High-Performance Matrix Multiplication: Theory and Practice," dsn, pp.0047, The International Conference on Dependable Systems and Networks (DSN'01), 2001
Usage of this product signifies your acceptance of the Terms of Use.