This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Dynamic Binary Translation and Optimization
June 2001 (vol. 50 no. 6)
pp. 529-548

Abstract—We describe a VLIW architecture designed specifically as a target for dynamic compilation of an existing instruction set architecture. This design approach offers the simplicity and high performance of statically scheduled architectures, achieves compatibility with an established architecture, and makes use of dynamic adaptation. Thus, the original architecture is implemented using dynamic compilation, a process we refer to as DAISY (Dynamically Architected Instruction Set from Yorktown). The dynamic compiler exploits runtime profile information to optimize translations so as to extract instruction level parallelism. This work reports different design trade-offs in the DAISY system and their impact on final system performance. The results show high degrees of instruction parallelism with reasonable translation overhead and memory usage.

[1] MPR, “TI's New 'C6x DSP Screams at 1600 MIPS,” Microprocessor Report, vol. 7, no. 2, Feb. 1997.
[2] Intel, IA-64 Application Developer's Architecture Guide, Intel Corp., Santa Clara, Calif., May 1999.
[3] R.P. Colwell, R.P. Nix, J.J. O'Donnell, D.B. Papworth,, and P.K. Rodman, ``A VLIW Architecture for a Trace Scheduling Compiler,'' IEEE Trans. Computers, vol. 37, no. 8, pp. 967-979, Aug. 1988.
[4] B.R. Rau, D.W.L. Yen, W. Yen, and R.A. Towle, “The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs,” Computer, pp. 12-35, Jan. 1989.
[5] Special issue of The J. Supercomputing, vol. 7, nos. 1/2, May 1993.
[6] K. Ebcioglu, “Some Design Ideas for a VLIW Architecture for Sequential-Natured Software,” M. Cosnard et al., eds., Parallel Processing, pp. 3-21, 1988 (Proc. IFIP WG 10.3 Working Conf. Parallel Processing).
[7] K. Ebcioglu and E. Altman, “DAISY: Dynamic Compilation for 100% Architectural Compatibility,” Research Report RC20538, IBM T.J. Watson Research Center, Yorktown Heights, N.Y., 1996.
[8] K. Ebcio$\breve{\rm g}$lu and E.R. Altman, "DAISY: Dynamic Compilation for 100% Architectural Compatibility," Proc. ISCA 24, ACM Press, New York, 1997, pp. 26-37.
[9] Intel, IA-64 Application Developer's Architecture Guide, Intel Corp., Santa Clara, Calif., May 1999.
[10] B.R. Rau, “Dynamically Scheduled VLIW Processors,” Proc. 26th Ann. Int'l Symp. Microarchitecture, pp. 80-92, Dec. 1993.
[11] J. Moreno, M. Moudgill, K. Ebcioglu, E. Altman, C.B. Hall, R. Miranda, S.-K. Chen, and A. Polyak, “Simulation/Evaluation Environment for a VLIW Processor Architecture,” IBM J. Research and Development, vol. 41, no. 3, pp. 287-302, May 1997.
[12] T.M. Conte and S.W. Sathaye, “Dynamic Rescheduling: A Technique for Object Code Compatibility in VLIW Architectures,” Proc. 28th Int'l Ann. Symp. Microarchitecture, pp. 208-218, Nov. 1995.
[13] G.M. Silberman and K. Ebcioglu,"An Architectural Framework for Migration from CISC to Higher Performance Platforms," Proc. Int'l Conf. Supercomputing, pp. 198-215, 1992.
[14] G.M. Silberman and K. Ebcioglu, "An Architectural Framework for Supporting Heterogeneous Instruction-Set Architectures," Computer, June 1993, pp. 39-56.
[15] M. Gschwind, K. Ebcioglu, E. Altman, and S. Sathaye, “Binary Translation and Architecture Convergence Issues for IBM System/390,” Proc. Int'l Conf. Supercomputing 2000, May 2000.
[16] K. Ebcioglu, E.R. Altman, and E. Hokenek, “A JAVA ILP Machine Based on Fast Dynamic Compilation,” Proc. IEEE MASCOTS Int'l Workshop Security and Efficiency Aspects of Java, Jan. 1997.
[17] K. Ebcioglu and R. Groves, “Some Global Compiler Optimizations and Architectural Features for Improving the Performance of Superscalars,” Research Report RC16145, IBM T.J. Watson Research Center, Yorktown Heights, N.Y., 1990.
[18] K. Ebcioglu and G. Silberman, “Handling of Exceptions in Speculative Instructions,” US Patent 5799179, Aug. 1998.
[19] S.A. Mahlke, W.Y. Chen, W.-m. Hwu, B.R. Rau, and M.S. Schlansker, “Sentinel Scheduling for VLIW and Superscalar Processors,” Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 238-247, Oct. 1992.
[20] V. Kathail, M. Schlansker, and B.R. Rau, “HPL PlayDoh Architecture Specification: Version 1,” Technical Report 93-80, HP Laboratories, Palo Alto, Calif., Mar. 1994.
[21] E. Altman, K. Ebcioglu, M. Gschwind, and S. Sathaye, “Method and Apparatus for Profiling Computer Program Execution,” filed for US Patent, Aug. 2000.
[22] H. Chung, S.-M. Moon, and K. Ebcioglu, “Using Value Locality on VLIW Machines through Dynamic Compilation,” Proc. 1999 Workshop Binary Translation, IEEE CS Technical Committee on Computer Architecture Newsletter, pp. 69-76, Dec. 1999.
[23] M. Moudgill and J. Moreno, “Run-Time Detection and Recovery from Incorrectly Ordered Memory Operations,” Research Report RC20857, IBM T.J. Watson Research Center, Yorktown Heights, N.Y., 1997.
[24] K. Ebcioglu, E. Altman, S. Sathaye, and M. Gschwind, “Execution-Based Scheduling for VLIW Architectures,” Proc. Euro-Par '99 Parallel Processing—Fifth Int'l Euro-Par Conf., pp. 1269-1280, Aug. 1999.
[25] M. Gschwind, E. Altman, S. Sathaye, P. Ledak, and D. Appenzeller, “Dynamic and Transparent Binary Translation,” Computer, pp. 54-59, Mar. 2000.
[26] M. Gschwind, “Pipeline Control Mechanism for High-Frequency Pipelined Designs,” US Patent 6192466, Feb. 2001.
[27] M. Gschwind, S. Kosonocky, and E. Altman, “High Frequency Pipeline Architecture Using the Recirculation Buffer,” in preparation.
[28] E. Altman, K. Ebcioglu, “Full System Binary Translation: RISC to VLIW,” in preparation.
[29] R. Nair and M. Hopkins, "Exploiting Instruction Level Parallelism in Processors by Caching Scheduled Groups," Proc. 24th Ann. Int'l Symp. Computer Architecture,Denver, Colo., June 1997.
[30] V. Bala, E. Duesterwald, and S. Banerjia, “Transparent Dynamic Optimization: The Design and Implementation of Dynamo,” Technical Report 99-78, HP Laboratories, Cambridge, Mass., June 1999.
[31] S. Sathaye, P. Ledak, J. LeBlanc, S. Kosonocky, M. Gschwind, J. Fritts, Z. Filan, A. Bright, D. Appenzeller, E. Altman, and C. Agricola, “BOA: Targeting Multi-Gigahertz with Binary Translation,” Proc. 1999 Workshop Binary Translation, IEEE CS Technical Committee on Computer Architecture Newsletter, pp. 2-11, Dec. 1999.
[32] E. Altman, M. Gschwind, and S. Sathaye, “BOA: The Architecture of a Binary Translation Processor,” Research Report RC21665, IBM T.J. Watson Research Center, Yorktown Heights, N.Y., Mar. 2000.
[33] K. Ebcioglu, E. Altman, S. Sathaye, and M. Gschwind, “Optimizations and Oracle Parallelism with Dynamic Translation,” Proc. 32nd Ann. Int'l Symp. Microarchitecture, pp. 284-295, Nov. 1999.
[34] M. Gschwind and E. Altman, “Optimization and Precise Exceptions in Dynamic Compilation,” Proc. 2000 Workshop Binary Translation, Oct. 2000, also in: Computer Architecture News, vol. 29, no. 1, Mar. 2001.
[35] E. Witchel and M. Rosenblum, “Embra: Fast and Flexible Machine Simulation,” Proc. 1996 ACM SIGMETRICS Int'l Conf. Measurement and Modeling of Computer Systems, pp. 68-79, May 1996.
[36] J. Moreno, K. Ebcioglu, M. Moudgill, and D. Luick, “ForestaPC User Instruction Set Architecture,” Research Report RC20733, IBM T.J. Watson Research Center, Yorktown Heights, N.Y., Feb. 1997.
[37] E. Altman and K. Ebcioglu, “Simulation and Debugging of Full System Binary Translation,” Proc. 13th Int'l Conf. Parallel and Distributed Computing Systems, pp. 446-453, Aug. 2000.
[38] K. Ebcioglu, J. Fritts, S. Kosonocky, M. Gschwind, E. Altman, K. Kailas, and T. Bright, “An Eight-Issue Tree-VLIW Processor for Dynamic Binary Translation,” Proc. 1998 Int'l Conf. Computer Design (ICCD '98)—VLSI in Computers and Processors, pp. 488-495, Oct. 1998.
[39] R. Sites, A. Chernoff, M. Kirk, M. Marks, and S. Robinson, “Binary Translation,” Comm. ACM, vol. 36, no. 2, pp. 69-81, Feb. 1993.
[40] A. Klaiber, “The Technology behind Crusoe Processors,” Technical report, Transmeta Corp., Santa Clara, Calif., Jan. 2000.
[41] E. Kelly, R. Cmelik, and M. Wing, “Memory Controller for a Microprocessor for Detecting a Failure of Speculation on the Physical Nature of a Component Being Addressed,” US Patent 5832205, Nov. 1998.
[42] K. Andrews and D. Sand, “Migrating a CISC Computer Family onto RISC via Object Code Translation,” Proc. Fifth Int'l Symp. Architectural Support for Programming Languages and Operating Systems, pp. 213-222, 1992.
[43] C. May, “Mimic: A Fast S/370 Simulator,” Proc. ACM SIGPLAN 1987 Symp. Interpreters and Interpretive Techniques, pp. 1-13, June 1987.
[44] R. F. Cmelik and D. Keppel, “Shade: A Fast Instruction-Set Simulator for Execution Profiling,” Proc. 1994 ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, pp. 128-137, May 1994.
[45] A. Chernoff, M. Herdeg, R. Hookway, C. Reeve, N. Rubin, T. Tye, S.B. Yadavalli, and J. Yates, “FX!32—A Profile-Directed Binary Translator,” IEEE Micro, vol. 18, no. 2, pp. 56-64, Mar. 1998.
[46] M. Rosenblum, S. Herrod, E. Witchel, and A. Gupta, "Complete Computer System Simulation," IEEE Parallel and Distributed Technology, Fall 1995.
[47] E. Altman, K. Ebcioglu, M. Gschwind, and S. Sathaye, “Advances and Future Challenges in Binary Translation and Optimization,” Proc. IEEE, submitted for publication, 2001.
[48] E. Rotenberg, Q. Jacobson, Y. Sazeides, and J.E. Smith, Trace Processors Proc. 30th Int'l Symp. Microarchitecture, pp. 138-148, 1997.

Index Terms:
Dynamic compilation, binary translation, dynamic optimization, just-in-time compilation, adaptive code generation, profile-directed feedback, instruction-level parallelism, very long instruction word architectures, virtual machines, instruction set architectures, instruction set layering.
Citation:
Kemal Ebcioglu, Erik Altman, Michael Gschwind, Sumedh Sathaye, "Dynamic Binary Translation and Optimization," IEEE Transactions on Computers, vol. 50, no. 6, pp. 529-548, June 2001, doi:10.1109/12.931892
Usage of this product signifies your acceptance of the Terms of Use.