This Article 
 Bibliographic References 
 Add to: 
Compiler-Assisted Multiple Instruction Rollback Recovery Using a Read Buffer
September 1995 (vol. 44 no. 9)
pp. 1096-1107

Abstract—Multiple instruction rollback (MIR) is a technique that has been implemented in mainframe computers to provide rapid recovery from transient processor failures. Hardware-based MIR designs eliminate rollback data hazards by providing data redundancy implemented in hardware. Compiler-based MIR designs have also been developed which remove rollback data hazards directly with data-flow transformations. This paper describes compiler-assisted techniques to achieve multiple instruction rollback recovery. We observe that some data hazards resulting from instruction rollback can be resolved efficiently by providing an operand read buffer while others are resolved more efficiently with compiler transformations. The compiler-assisted scheme presented consists of hardware that is less complex than shadow files, history files, history buffers, or delayed write buffers, while experimental evaluation indicates performance improvement over compiler-based schemes.

[1] M.S. Pittler,D.M. Powers,, and D.L. Schnabel,“System development and technology aspects of the IBM 3081 processor complex,” IBM J. Research and Development, vol. 26, pp. 2-11, Jan. 1982.
[2] Y. Tamir and M. Tremblay, "High-Performance Fault-Tolerant VLSI Systems Using Micro Rollback," IEEE Trans. Computers, vol. 39, no. 4, Apr. 1990, pp. 548-554.
[3] C.-C.J. Li,S.-K. Chen,W.K. Fuchs,, and W.-M.W. Hwu,“Compiler-baseded multiple instruction retry,” IEEE Trans. Computers, vol. 44, pp. 35-46, Jan. 1995.
[4] N. Alewine, S. Chen, C. Li, W. Fuchs, and W. Hwu, "Branch Recovery With Compiler-Assisted Multiple Instruction Retry," Proc. 22nd Ann. Int'l Symp. Fault-Tolerant Computing, pp. 66-73, 1992.
[5] L. Spainhower,J. Isenberg,R. Chillarege,, and J. Berding,“Design for fault-tolerance in system ES/9000 model 9000,” Proc. 22nd Int’l Symp. Fault-Tolerant Computing, pp. 38-47, July 1992.
[6] P.M. Kogge,K.T. Truong,D.A. Richard,, and R.L. Schoenike,“Checkpoint retry mechanism,” U.S. Patent No. 4912707, Mar. 1990. Assignee: International Business Machines Corporation, Armonk, N.Y.
[7] Y. Tamir et al., "The UCLA Mirror Processor: A Building Block for Self-Checking Self-Repairing Computing Nodes," Proc. 21st Int'l Fault-Tolerant Computing Symp. (FTCS 91), IEEE CS Press, Los Alamitos, Calif., 1991, pp. 178-185.
[8] N.J. Alewine,W.K. Fuchs,, and W.-m.W. Hwu,“Application of compiler-assisted rollback recovery to speculative execution repair,” Hardware and Software Architectures for Fault Tolerance.New York: Springer-Verlag, 1994.
[9] J.E. Smith and A.R. Pleszkun,"Implementing Precise Interrupts in Pipelined Processors," IEEE Trans. Computers, vol. 37, no. 5, pp. 562-573, May 1988.
[10] M.L. Ciacelli,“Fault handling on the IBM 4341 processor,” Proc. 11th Int’l Symp. Fault-Tolerant Computing, pp. 9-12, June 1981.
[11] W.F. Brucker and R.E. Josephson,“Designing reliability into the VAX 8600 system,” Digital Tech. J. Digital Equipment Corp., vol. 1, no. 1, pp. 71-77, Aug. 1985.
[12] G.L. Hicks,D. Howe Jr.,, and A. Zurla Jr.,“Instruction retry mechanism for a data processing system,” U.S. Patent No. 4044337, Aug. 1977. Assignee: International Business Machines Corp., Armonk, N.Y.
[13] D.B. Fite,T. Fossum,, and D. Manley,“Design strategy for the VAX 9000 systems,” Digital Tech. J. Digital Equipment Corp., vol. 2, no. 4, pp. 13-24, Fall 1990.
[14] E.B. Eichelberger and T.W. Williams, "A Logic Design Structure for LSI Testability," Proc. 14th Design Automation Conf., IEEE Press, Piscataway, NJ, 1977, pp. 462-468.
[15] J.S. Liptay,“The ES/9000 high end processor design,” IBM J. Research and Development, vol. 36, no. 3, May 1992.
[16] A.V. Aho, R. Sethi, and J.D. Ullman, Compilers, Principles, Techniques and Tools.New York: Addison-Wesley, 1985.
[17] N.J. Alewine,“Complier-assisted multiple instruction rollback recovery using a read buffer,” PhD thesis, Tech. Rep. CRHC-93-06, Univ. of Illinois at Urbana-Champaign, 1993.
[18] J.A. Bondy and U. Murty,Graph Theory with Applications.London: Macmillan Press Ltd., 1979.
[19] P.P. Chang, S.A. Mahlke, W.Y. Chen, N.J. Warter, and W.W. Hwu, "IMPACT: An Architectural Framework for Multiple-Issue Processors," Proc. 18th Ann. Int'l Symp. Computer Architecture, pp. 276-275,Toronto, Ontario, Canada, May 1991.
[20] J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann, San Mateo, Calif., 1990.

Index Terms:
Fault-tolerance, error recovery, instruction retry, compilers.
Shyh-Kwei Chen, Neal J. Alewine, W. Kent Fuchs, Wen-mei W. Hwu, "Compiler-Assisted Multiple Instruction Rollback Recovery Using a Read Buffer," IEEE Transactions on Computers, vol. 44, no. 9, pp. 1096-1107, Sept. 1995, doi:10.1109/12.464388
Usage of this product signifies your acceptance of the Terms of Use.