Today's most commercially significant microprocessors remain firmly wedded to legacy instruction set architectures, some of which are now over a decade old. Despite the admitted shortcomings of these ISAs, manufacturers are reluctant to develop radically new ones because they risk losing the commercial advantage of their product's existing software base.
On the other side of the coin, software developers find porting code to a new architecture difficult and time-consuming. If the architecture fails to gain enough market share, they—like the hardware developers—also risk losing a significant investment.
Both these factors have at times conspired to forestall innovation in processor design. They also make it more difficult for new competitors to arise. Binary translation—a set of techniques that directly translate compiled code—could help break the innovation-strangling relation between ISAs and their software base.
Designing a new ISA is expensive, particularly because it often requires recompiling several operating systems and applications. Modern languages don't define their semantics tightly enough to make recompilation transparent. In contrast, the semantics of binary code is usually well defined, facilitating automatic and transparent translation. Developers have practiced binary translation for many years, but only with recent increases in processing power has it become possible to fully use translation.
When porting legacy code from a legacy ISA to a new architecture, you can
• provide a special processor mode to execute legacy code on the new processor,
• recompile the program to the new instruction set, or
• use a variety of software methods to interpret or translate the application.
We focus in this special issue on the third class, the software methods that let you translate at runtime or while offline. Each of the three articles that follow demonstrates a different approach to software translation:
• Hewlett-Packard's dynamic translator, Aries, eases the transition of applications from HP's Precision Architecture to IA-64,
• IBM Research's BOA translates the legacy PowerPC code of a full system rather than just the application code, and
• The University of Queensland's UQBT is a binary translation system that—instead of translating between a specific pair of instruction sets—lets you generalize binary translation and work between virtually any two architectures.
shows the universe of three various translations.
Figure 1. The universe of binary translators.
Three Types of Translation
Software-based binary translation systems—including those discussed in the following articles—can be classified as emulators, dynamic translators, or static translators. An emulator interprets program instructions at runtime. The system doesn't save the interpreted instructions or cache them. Emulators are relatively easy to develop and with only modest effort can be made highly compatible with the legacy architecture.
A dynamic translator, on the other hand, translates between the legacy and the target ISA, caching the pieces of code for future use. Java JIT (just-in-time) compilers are probably the best-known translators in this class. The sidebar " A Just-in-Time Compiler
" describes Latte, a JIT compiler that generates high-quality Sparc code from Java bytecode.
A static translator translates programs offline and can apply more rigorous code optimizations than can a dynamic translator. Static translators can also use execution profiles obtained during a program's previous run.
All three approaches have limitations: Both emulation and dynamic translation impose runtime overhead, while static translation as a stand-alone tool requires end-user involvement. Much recent work in this area revolves around innovative hybrid solutions to combine the best features of each approach.
For binary translation to serve as a viable alternative to legacy architecture execution, the performance of the translated target executable should be competitive with the legacy architecture's performance. The legacy architecture executable has the luxury of being produced using an optimizing compilation process.
Binary translation generally does not have high-level-language code available and thus works solely from the executable code. Not knowing the full semantics of the original source code, a binary translator cannot perform many optimizations available to a compiler.
One common approach to improving binary translation performance is profile-guided optimization. As the sidebar " Collaborative Profiling
" shows, execution profiles can be generated efficiently. They are then used to guide optimization performed during the translation process and during further tuning of the translated image.
A static translation system can merge multiple profiles into a single profile. A dynamic translation system can retranslate translated code periodically as the profiles change.
Sometimes a dynamic translator forms part of the translated code's execution thread, which means that execution stalls can occur during translation. Dynamic translators often make use of program behavior information to optimize execution. Hewlett-Packard's Dynamo, discussed in the sidebar " Native Binary Acceleration
," is a good example of dynamic optimization.
Other optimization techniques used in binary translation include:
• ISA remapping to handle register overlaps present in the legacy ISA and remap them to the target ISA;
• basic block reordering to keep the target image execution as sequential as possible so that conditional branches will typically fall through, which helps speed instruction fetching and cache performance;
• memory coloring to improve the mapping of the translated image onto the memory hierarchy of the target environment; 1 and
• code specialization to clone procedures based on the invariance of parameter values.
Any optimization considered by the binary translation system can be used only if it preserves the correctness of the translation. Correctness and performance are often competing issues.
Since all machines—legacy and new—are Turing machines, any computation done on one can be emulated on another. Binary translation aspires to do more than emulate the legacy architecture efficiently. Binary translation seeks to emulate legacy architecture so efficiently that code runs at least as fast on the new architecture as on the legacy machine.
Attaining this goal requires careful design in several areas. Executing legacy architectures efficiently is a difficult task since architectures are never fixed. The sidebar " Translating MMX
" describes one such architecture change.
The number of registers in the new machine should be at least equal to the number in the legacy machine. If the new machine has fewer registers, then some legacy register values must be kept in memory with costly loads and stores used to access them.
The system states, stored in special-purpose registers, must also be maintained in the new architecture. For example, if the legacy architecture has special segmentation registers, they should be mirrored by the target architecture. You must also deal with flag registers that maintain condition codes. One simple approach is to have the new architecture set them in the same way they are set in the legacy machine. Alternatively, a translator may be able to use a combination of redundant flag elimination plus smart flag calculation. 2
Memory-mapped I/O presents a problem particular to whole system translators. References to I/O locations can have side effects such as injecting a packet into a network or turning on an alarm, and must be done in program order and without caching.
Many architectures contain instructions that must be executed atomically with respect to memory; this means that a second processor cannot access memory while an instruction is executed. Replicating these semantics on a different architecture is often difficult.
Noninterruptability causes problems similar to atomicity. Some architectures, such as the IBM S/390, have complex instructions that take many cycles to execute and must execute completely or not at all. The new architecture can prerun translations of such instructions without side effects to guarantee the real run will work.
The noninterruptability problem is one instance of the more general problem of dealing with precise exceptions for the legacy machine. For example, if a particular memory load causes an exception, the legacy exception handler will expect all instructions prior to that load to be completed but none of the instructions following the load to be completed. Precise exceptions are harder still if translated code is reordered.
Precise discovery of the legacy code also poses a problem, especially for static translation. Given an executable file, it is not always clear what is code and what is data. Although the general problem of code discovery cannot be fixed, various tools have used value tracking and other methods to solve the problem successfully in many practical situations.
A problem related to finding legacy code is self-referential legacy code—code that looks at itself, for example, to perform a checksum. Because of the nature of the code, a copy of the legacy program counter must be maintained by the new machine and used whenever the legacy machine references the counter for anything other than an instruction fetch.
Self-modifying code presents problems similar to finding legacy code and self-referential legacy code. Handling self-modifying code is not possible with a purely static translator, and is not always easy even with a dynamic translator: The new architecture must provide some means to detect modification of legacy code so that translations corresponding to the modified code can be invalidated.
Ultimately, binary translation seeks to provide the illusion of transparency: Code can run on platform A exactly as it would on platform B. Most aspects of transparency described thus far relate to instruction-set conversion. But there are several other aspects.
A user may wish to run an application compiled for one OS on a different OS. For example, you might want to run a Windows productivity application on a Unix workstation. You might want to move an application between different flavors of a single OS—a Win32 application to a 64-bit Windows environment, for example.
Translated applications can be cached in persistent storage but must preserve the normal semantics of an executable file. For example, any initializations done in invoking the executable must still be performed. Likewise, if the executable file is updated, the translation must be as well.
A legacy platform may provide instruction semantics that must be emulated on a newer platform. For example, direct I/O instructions cannot be executed directly on the hardware in most modern OSs. Solving this issue typically requires emulating legacy hardware devices.
Complete OS emulation
OS emulation can be dealt with in several ways. One way to bypass most complications involves emulating the entire legacy OS. This approach results in a structure similar to a virtual machine.
The concept of a virtual machine dates back to the early 1960s 3
and has recently come back into vogue. 4
A classical virtual machine lets code run freely as long as no privileged instruction executes.
For a privileged instruction, a special code sequence emulates the intended operation, possibly using the primitives of an underlying OS. When integrated with binary translation, most code in a legacy OS would be translated normally, with special code sequences executed for privileged instructions. As modern virtual machines stabilize, we expect to see integrated solutions consisting of a binary translator interfaced to a virtual machine.
The approach of complete OS emulation is practical if, as in the Daisy emulator, 5
the new platform's whole purpose is to emulate a legacy platform using a very fast CPU. This goal is not always practical since the cost of emulating the entire OS may be prohibitive for some systems.
A more common solution involves a port of the OS to the new architecture, with binary translation employed only at the application level. The new OS would run both native and translated legacy applications.
When the OS identifies a legacy executable, it launches the translator. Afterward, most work can be done at the application level. Note that when you use static translation, the OS can still provide transparency by diverting execution from the legacy executable into the pretranslated native executable.
When you migrate an application between similar OSs—say, from one Unix to another—several issues must be resolved at the application-OS interface, including
• calling conventions, because one machine may pass parameters in registers and another may pass them on the stack;
• memory mapping, because the OS may have trouble locating the application's stack; and
• memory alignment, where different architectures have different requirements.
These three problems multiply when a translated application needs to communicate with a native application directly through shared memory or through another interprogram communication mechanism.
When the application needs to be migrated to a different platform, you can use a jacket layer to transform OS calls from the semantics of the legacy OS to the new one.
We outlined several innovations in binary translation, but we believe the field still has improvements to make. We need to
• improve translation from virtual instruction sets like Java virtual machines,
• provide full system translation that more efficiently translates the entire operating system,
• use more sophisticated profiling techniques to adapt to behavior changes quickly, and
• support noninterruptibility, atomicity, and multiple legacy architectures in one target.
We also need to provide additional information to the translator beyond that given by object-code semantics. Java class files have moved in this direction already, mostly to facilitate safety and security analysis. We need to develop new compilation techniques that benefit directly from dynamic code generation.
The concept for this special issue originated at the Workshop on Binary Translation, which took place in October 1999 as part of the conference on Parallel Architecture and Compilation Techniques ( http://www.rose-hulman.edu/PACT/). We thank the workshop's program committee members and the organizers of PACT 99.
Erik R. Altman
is in the high-performance VLSI architecture group at IBM T.J. Watson Research. He received a PhD in computer science from McGill University. Contact Altman at firstname.lastname@example.org.
is an associate professor at Northeastern University. He is also the director of the Northeastern University Computer Architecture Research Laboratory (NUCAR). He received a PhD in electrical engineering from Rutgers University. Contact Kaeli at email@example.com.
is a researcher at Radguard, a network security company. Earlier he headed a binary translation project at Intel. He received a BA in computer science from the Technion and an MSc from the Hebrew University in Jerusalem. Contact Sheffer at firstname.lastname@example.org.