
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
V. Balasubramanian, P. Banerjee, "Tradeoffs in the Design of Efficient AlgorithmBased Error Detection Schemes for Hypercube Multiprocessors," IEEE Transactions on Software Engineering, vol. 16, no. 2, pp. 183196, February, 1990.  
BibTex  x  
@article{ 10.1109/32.44381, author = {V. Balasubramanian and P. Banerjee}, title = {Tradeoffs in the Design of Efficient AlgorithmBased Error Detection Schemes for Hypercube Multiprocessors}, journal ={IEEE Transactions on Software Engineering}, volume = {16}, number = {2}, issn = {00985589}, year = {1990}, pages = {183196}, doi = {http://doi.ieeecomputersociety.org/10.1109/32.44381}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Software Engineering TI  Tradeoffs in the Design of Efficient AlgorithmBased Error Detection Schemes for Hypercube Multiprocessors IS  2 SN  00985589 SP183 EP196 EPD  183196 A1  V. Balasubramanian, A1  P. Banerjee, PY  1990 KW  hypercube multiprocessors; algorithmbased error detection; numerical linear algebra; QR factorization; encoding; checksum; sumofsquares; 16processor Intel iPSC2/D4/MX; encoding; error detection; linear algebra; multiprocessing systems; software engineering. VL  16 JA  IEEE Transactions on Software Engineering ER   
The authors provide an indepth study of the various issues and tradeoffs available in algorithmbased error detection, as well as a general methodology for evaluating the schemes. They illustrate the approach on an extremely useful computation in the field of numerical linear algebra: QR factorization. They have implemented and investigated numerous ways of applying algorithmbased error detection using different systemlevel encoding strategies for QR factorization. Specifically, schemes based on the checksum and sumofsquares (SOS) encoding techniques have been developed. The results of studies performed on a 16processor Intel iPSC2/D4/MX hypercube multiprocessor are reported. It is shown that, in general, the SOS approach gives much better coverage (85100%) for QR factorization while maintaining low overheads (below 10%).
[1] G. C. Fox, M. A. Johnson, G. A. Lyzenga, S. W. Otto, and J. K. Salmon,Solving Problems on Concurrent Processors. Englewood Cliffs, NJ: PrenticeHall, 1989.
[2] J. G. Kuhl and S. M. Reddy, "Fault diagnosis in fully distributed systems," inProc. 11th Int. Symp. FaultTolerant Computing, June 1981, pp. 100105.
[3] J. R. Armstrong and F. G. Gray, "Fault diagnosis in a Booleanncube array of multiprocessors,"IEEE Trans. Comput., vol. C30, pp. 587 590, Aug. 1981.
[4] E. Dilger and E. Ammann, "System level selfdiagnosis inncube connected multiprocessor networks," inProc. 14th Int. Symp. Fault Tolerant Computing, Kissimmee, FL, June 1984, pp. 184189.
[5] R. K. Iyer and D. J. Rossetti, "Permanent CPU errors and system activity: Measurement and modeling," inProc. RealTime Systems Symp., 1983.
[6] J. A. Abrahamet al., "Fault tolerance techniques for systolic arrays,"IEEE Comput. Mag., vol. 20, pp. 6574, July 1987.
[7] K. H. Huang and J. A. Abraham, "Algorithmbased fault tolerance for matrix operations,"IEEE Trans. Comput., vol. C33, pp. 518 528, June 1984.
[8] J. Y. Jou and J. A. Abraham, "Fault tolerant matrix arithmetic and signal processing on highly concurrent computing structures,"Proc. IEEE (Special Issue on Fault Tolerance in VLSI), vol. 74, pp. 732 741, May 1986.
[9] V. S. S. Nair and J. A. Abraham, "General linear codes for fault tolerant matrix operations on processor arrays," inProc. Int. Symp. FaultTolerant Comput., Tokyo, June 1988, pp. 180185.
[10] J. Y. Jou and J. A. Abraham, "Fault tolerant FFT networks," inProc. 15th Int. Symp. FaultTolerant Computing, Ann Arbor, MI, June 1985.
[11] M. Malek and Y. H. Choi, "A faulttolerant FFT processor," inProc. 15th Int. Symp. FaultTolerant Computing, Ann Arbor, MI, June 1985.
[12] F. T. Luk, "Algorithmbased fault tolerance for parallel matrix solvers,"Proc. SPIE, vol. 564 (Real Time Signal Processing VIII), 1985.
[13] P. Banerjee and J. A. Abraham, "Faultsecure algorithms for multiple processor systems," inProc. 11th Int. Symp. Comput. Architecture, June 1984, pp. 279287.
[14] A. L. N. Reddy and P. Banerjee, "'Algorithmbased fault detection techniques in signal processing applications,"IEEE Trans. Comput., to be published.
[15] C.Y. Chen and J. A. Abraham, "Faulttolerant systems for the computation of eigenvalues and singular values," inProc. SPIE Conf., Aug. 1986, pp. 228237.
[16] P. Banerjee and J. A. Abraham, "Bounds on algorithmbased fault tolerance in multiple processor systems,"IEEE Trans. Comput., vol. C35, pp. 296306, Apr. 1986.
[17] P. Banerjee and J. A. Abraham, "Concurrent fault diagnosis in multiple processor systems," inProc. 16th Int. Symp. FaultTolerant Computing, Vienna, Austria, July 1986, pp. 298303.
[18] P. Banerjee and J. A. Abraham, "A probabilistic model of algorithmbased fault detection and tolerance in array processors for realtime systems," inProc. RealTime Systems Symp., New Orleans, LA, Dec. 1986, pp. 7278.
[19] C. J. Anfinson and F. T. Luk, "A linear algebraic model of algorithmbased fault tolerance,"IEEE Trans. Comput., vol. 37, pp. 15991604, Dec. 1988.
[20] P. Banerjee and C. Stunkel, "A novel approach to systemlevel fault tolerance in hypercube multiprocessors," inProc. 3rd ACM Conf. Hypercube Concurrent Computers and Applications, Pasadena, CA, Jan. 1988, pp. 307311.
[21] P. Banerjeeet al., "An evaluation of systemlevel fault tolerance on the intel hypercube multiprocessor," inProc. 18th Int. Symp. FaultTolerant Comput., 1988, pp. 362367.
[22] V. Balasubramanian and P. Banerjee, "Algorithmbased error detection for signal processing applications on a hypercube multiprocessor," inProc. 10th RealTime Systems Symp., Dec. 1989.
[23] R. B. MuellerThuns, P. McFarland, and P. Banerjee, "Algorithmbased fault tolerance for adaptive least squares lattice filtering on a hypercube multiprocessor," inProc. 18th Int. Conf. Parallel Proc., St. Charles, IL, Aug. 1989.
[24] C. Aykanat and F. Ozguner, "A concurrent error detecting conjugant gradient algorithm on a hypercube multiprocessor," inProc. 17th Int. Symp. FaultTolerant Computing, Pittsburgh, PA, July 1987, pp. 204 209.
[25] D. M. Andrews, "Using executable assertions for testing and fault tolerance," inProc. 9th Int. Symp. FaultTolerant Computing, Madison, WI, June 1979, pp. 102105.
[26] B. Randell, "System structure for fault tolerance,"IEEE Trans. Software Eng., vol. SE1, pp. 220232, June 1975.
[27] A. Avizienis and J.C. Laprie, "Dependable computing: From concepts to design diversity,"Proc. IEEE, vol. 74, May 1986.
[28] C. J. Weinstein, "Roundoff noise in floating point fast Fourier transform computation,"IEEE Trans. Audio Electroacoust., vol. AU17, pp. 209215, Sept. 1969.
[29] G. H. Golub and C. F. Van Loan, inMatrix Computations. Baltimore, MD: Johns Hopkins University Press, 1983.
[30] W. M. Gentleman, "Error analysis of QR decomposition by Givens transforms,"Linear Algebra Appl., vol. 25, pp. 189197, 1975.
[31] A. H. Sameh and D. J. Kuck, "On stable parallel linear solver,"J. ACM, vol. 25, pp. 8191, 1978.
[32] A. Pothen, S. Jha, and U. Vemulapati, "Orthogonal factorization on a distributed memory multiprocessor, inProc. 2nd SIAM Conf. Hypercube Computers and Applications, 1987, pp. 587596.
[33] J. Y. Jou and J. A. Abraham, "Faulttolerant matrix operations on multiple processor systems using weighted checksums,"Proc. SPIE, vol. 495, Aug. 1984.
[34] W. M. Gentleman and H. T. Kung, "Matrix triangularization by systolic arrays,"Proc. SPIE, vol. 298 (Real Time Signal Processing IV), pp. 1926, 1981.
[35] V. Balasubramanian and P. Banerjee, "Compilerassisted synthesis of algorithmbased checking in multiprocessors,"IEEE Trans. Comput. (Special Issue on FaultTolerant Computing), Apr. 1990, to be published.
[36] W. Harrison, "An overview of the structure of parafrase,"Univ. Illinois, UrbanaChampaign, CSRD Rep. 501, PR852, UILUENG 858002, July 1985.