This Article 
 Bibliographic References 
 Add to: 
Compiler-Assisted Multiple Instruction Word Retry for VLIW Architectures
December 2001 (vol. 12 no. 12)
pp. 1293-1304

Very Long Instruction Word (VLIW) architectures can enhance performance by exploiting fine-grained instruction level parallelism. In this paper, we describe a compiler assisted multiple instruction word retry scheme for VLIW architectures. A read buffer is used to resolve the more frequent on-path hazards, while the compiler resolves the remaining branch hazards. Performance evaluation is described for 11 benchmark programs based on the IBM VLIW research compiler, Chameleon. Experimental results indicate that, for a VLIW machine with P functional units to rollback N instruction words, a read buffer of 2NP entries with the compiler assist can be an effective approach in producing low overhead runtime performance and small code growth, for P = 4, 8, 12, and 16 and N \leq 3.

[1] R.P. Colwell et al., "A VLIW Architecture for a Trace Scheduling Compiler," Proc. Second Symp. Architectural Support for Programming Languages and Operating Systems, ACM, 1987, pp. 180-192.
[2] J. Fisher,“VLIW architecture and the ELI-512,” Proc. 10th Int’l Symp. Computer Architecture, pp. 140-150, May 1983.
[3] B.R. Rau, D.W.L. Yen, W. Yen, and R.A. Towle, “The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs,” Computer, pp. 12-35, Jan. 1989.
[4] J.R. Ellis, Bulldog: A Compiler for VLIW Architectures.Cambridge, Mass.: MIT Press, 1986.
[5] J.A. Fisher, “Trace Scheduling: A Technique for Global Microcode Compaction,” IEEE Trans. Computers, vol. 30, no. 7, pp. 478-490, July 1981.
[6] W.W. Hwu, S.A. Mahlke, W.Y. Chen, P.P. Chang, N.J. Warter, R.A. Bringmann, R.G. Ouellette, R.E. Hank, T. Kiyohara, G.E. Haab, J.G. Holm,, and D.M. Lavery, ``The Superblock: An Effective Technique for VLIW and Superscalar Compilation,'' J. Supercomputing, vol. 7, pp. 9-50, 1993.
[7] K. Ebioglu and T. Nakatani,“A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture,” Languages and Compilers for Parallel Computing, pp. 213-229.Cambridge, Mass.: MIT Press, 1990.
[8] M. Lam, "Software Pipelining: An Effective Scheduling Technique for VLIW Machines," Proc. ACM SIGPLAN Conf. Programming Language Design and Implementation, 1988.
[9] S. Banerjia, K.N. Menezes, S.W. Sathaye, and T.M. Conte, “MPS: Miss-Path Scheduling for Multiple Issue Processors,” IEEE Trans. Computers, vol. 47, no. 12, pp. 1382-1397, Dec. 1998.
[10] X. Castillo, S.R. McConnel, and D.P. Siewiorek, “Derivation and Calibration of a Transient Error Reliability Model,” IEEE Trans. Computers, vol. 31, no. 7, pp. 658-671, July 1982.
[11] R.K. Iyer and D.J. Rossetti, "A Measurement-Based Model for Workload Dependence of CPU Errors," IEEE Trans. Computers, vol. 35, no. 6, pp. 511-519, June 1986.
[12] C.-C.J. Li and W.K. Fuchs, “CATCH—Compiler-Assisted Techniques for Checkpointing,” Proc. IEEE Fault-Tolerant Computing Symp., pp. 74-81, June 1990.
[13] W.W. Hwu and Y. Patt, “Checkpoint Repair for High Performance Out-of-Order Execution Machines,” IEEE Trans. Computers, vol. 36, no. 12, pp. 1496-1514, Dec. 1987.
[14] M.L. Ciacelli, “Fault Handling on the IBM 4341 Processor,” Proc. 11th Int'l Symp. Fault-Tolerant Computing, pp. 9-12, June 1981.
[15] Y. Tamir and M. Tremblay, "High-Performance Fault-Tolerant VLSI Systems Using Micro Rollback," IEEE Trans. Computers, vol. 39, no. 4, Apr. 1990, pp. 548-554.
[16] M.S. Pittler, D.M. Powers, and D.L. Schnabel, “System Development and Technology Aspects of the IBM 3081 Processor Complex,” IBM J. Research and Development, vol. 26, pp. 2-11, Jan. 1982.
[17] W.F. Bruckert and R.E. Josephson, “Designing Reliability into the VAX 8600 System,” Digital Technical J. Digital Equipment Corporation, pp. 71-77, Aug. 1985.
[18] L. Spainhower,J. Isenberg,R. Chillarege,, and J. Berding,“Design for fault-tolerance in system ES/9000 model 9000,” Proc. 22nd Int’l Symp. Fault-Tolerant Computing, pp. 38-47, July 1992.
[19] J.S. Liptay, "Design of the IBM Enterprise Sytem/9000 High-End Processor," IBM J. Research and Development, Vol. 36, No. 4, July 1992, pp. 713-731.
[20] L. Spainhower and T.A. Gregg, “G4: A Fault-Tolerant CMOS Mainframe,” Proc. 28th Int'l Symp. Fault-Tolerant Computing, pp. 432-440, 1998.
[21] C.-C.J. Li,S.-K. Chen,W.K. Fuchs,, and W.-M.W. Hwu,“Compiler-baseded multiple instruction retry,” IEEE Trans. Computers, vol. 44, pp. 35-46, Jan. 1995.
[22] S.-K. Chen, N.J. Alewine, W.K. Fuchs, and W.-M.W. Hwu, “Incremental Compiler Transformations for Multiple Instruction Retry,” Software-Practice&Experience, vol. 24, no. 12, pp. 1179-1198, Dec. 1994.
[23] N.J. Alewine, S.-K. Chen, W.K. Fuchs, and W.-M.W. Hwu, “Compiler Assisted Multiple Instruction Rollback Retry Using a Read Buffer,” IEEE Trans. Computers, vol. 44, no. 9, pp. 1096-1107, Sept. 1995.
[24] D.K. Pradhan, Fault-Tolerant Computing: Theory and Techniques, Volumn I. Prentice Hall, 1986.
[25] K. Wilken and J. Shen, "Continuous Signature Monitoring: Low-Cost Concurrent-Detection of Processor Control Errors," IEEE Trans. Computer-Aided Design, vol. 9, no. 3, pp. 629-641, June 1990.
[26] J. Ohlsson and M. Rimen, “Implicit Signature Checking,” Proc. Int'l Symp. Fault-Tolerant Computing 25, pp. 218-227, 1995.
[27] C.L. Chen et al., "Fault-Tolerance Design of the IBM Enterprise System/9000 Type 9021 Processors," IBM J. Research and Development, vol. 36, no. 4, pp. 765-778, July 1992.
[28] Y. Tamir et al., "The UCLA Mirror Processor: A Building Block for Self-Checking Self-Repairing Computing Nodes," Proc. 21st Int'l Fault-Tolerant Computing Symp. (FTCS 91), IEEE CS Press, Los Alamitos, Calif., 1991, pp. 178-185.
[29] D.M. Blough and A. Nicolau, “Fault Tolerance in Super-Scalar and VLIW Processors,” Proc. 1992 IEEE Workshop Fault-Tolerant Parallel and Distributed Systems, pp. 193-200, 1992.
[30] J.G. Holm and P. Banerjee, “Low Cost Concurrent Error Detection in a VLIW Architecture Using Replicated Instructions,” Proc. Int'l Conf. Parallel Processing, pp. 192-195, 1992.
[31] M.A. Schuette and J.P. Shen, “Exploiting Instruction-Level Resource Parallelism for Transparent Integrated Control-Flow Monitoring,” Proc. 21st Int'l Symp. Fault-Tolerant Computing, pp. 318-325, 1991.
[32] S.-K. Chen, W.K. Fuchs, and W.-M.W. Hwu, “The Application of Compiler-Assisted Multiple Instruction Retry to VLIW Architectures,” Proc. 1994 IEEE Workshop Fault-Tolerant Parallel and Distributed Systems, pp. 51-58, June 1994.
[33] A. Aiken and A. Nicolau,“A development environment for horizontal microcode,” IEEE Transactions on Software Engineering, vol. 14, no. 5, pp. 584-594, May 1988.
[34] K. Ebcioglu, “Some Design Ideas for a VLIW Architecture for Sequential Natured Software,” Proc. Int'l Federation for Information Processing 10.3 Working Conf. Parallel Processing, pp. 3-21, Apr. 1988.
[35] S.-M. Moon and S.D. Carson, “Generalized Multiway Branch Unit for VLIW Microprocessors,” IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 8, pp. 850-862, Aug. 1995.
[36] M. Johnson, Superscalar Microprocessor Design. Prentice Hall, 1991.
[37] D.A. Padua and M.J. Wolfe, "Advanced Compiler Optimizations for Supercomputers," Comm. ACM, vol. 29, Dec. 1986.
[38] A.V. Aho, R. Sethi, and J.D. Ullman, Compilers, Principles, Techniques and Tools.New York: Addison-Wesley, 1985.
[39] J. Moreno, M. Moudgill, K. Ebcioglu, E. Altman, C.B. Hall, R. Miranda, S.-K. Chen, and A. Polyak, “Simulation/Evaluation Environment for a VLIW Processor Architecture,” IBM J. Research and Development, vol. 41, no. 3, pp. 287-302, May 1997.
[40] M. Moudgill, J.H. Moreno, K. Ebcioglu, E. Altman, S.-K. Chen, and A. Polyak, “Compiler/Architecture Interaction in a Tree-Based VLIW Processor,” IEEE Technical Committee on Computer Architecture Newsletter, pp. 10-12, June 1997.
[41] S.-K. Chen, W.K. Fuchs, and W.-M.W. Hwu, “An Analytical Approach to Scheduling Code for Superscalar and VLIW Architectures,” Proc. Int'l Conf. Parallel Processing, vol. 1, pp. 285-292, Aug. 1994.
[42] K. Ebcioglu, J. Fritts, S. Kosonocky, M. Gschwind, E. Altman, and K. Kailas, “An Eight Issue Tree-VLIW Processor for Dynamic Binary Translation,” Proc. Int'l Conf. Computer Design, Oct. 1998.

Index Terms:
Fault-tolerant computing, instruction retry, compilers, VLIW architectures, instruction level parallelism
S.-K. Chen, W.K. Fuchs, "Compiler-Assisted Multiple Instruction Word Retry for VLIW Architectures," IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 12, pp. 1293-1304, Dec. 2001, doi:10.1109/71.970564
Usage of this product signifies your acceptance of the Terms of Use.