Issue No. 02 - March/April (1999 vol. 19)
The annual Hot Chips Symposium focuses on high-performance chips, systems, and related topics and emphasizes real products and real technology. The tenth meeting in the series took place at Stanford University in August 1998.
Hot Chips strives to showcase the latest developments, however this limits the conference proceedings to only copies of the slide presentations. We invited the authors of some of the best presentations to prepare full-length versions for this special issue of IEEE Micro.
Our first article addresses the recently announced IBM S/390 G5 microprocessor. This design is impressive because it is an example of a traditional CISC architecture running at 500 MHz, which is competitive in clock speed with today's RISC processors. As integration increases, designers have more transistors at their disposal (see the Billion-Transistor Architectures special issue of Computer, Sept. 1997). The authors feel that some portion of these transistors should be used to improve the reliability and availability of computers. As we come to depend on computers for more and more aspects of our lives, this topic increases in importance.
Designers of Compaq's Alpha 21264 microprocessor wanted to obtain performance leadership by adding complexity in a regular fashion amenable to VLSI implementation. Their design goals emphasized sustained performance on large applications. This performance requires a dynamically scheduled architecture to tolerate unpredictable memory latencies, a very aggressive memory system supporting up to 32 in-flight loads and 32 in-flight stores, and prediction and speculation of many operations other than branches. The 0.35-micron 21264 can execute up to 2.4 billion instructions per second with a 600-MHz cycle time.
Computer graphics continues to increase in importance for mainstream computer systems. Addressing some of the needs of this field are the AMD 3DNow! architecture extensions and their implementation in two microprocessors. These extensions provide operations on pairs of single-precision floating-point operands with the same latency as individual single-precision floating-point operation. This organization can be thought of as a limited form of SIMD or short vector operation, similar to the MMX instruction set extensions for small fixed-point values. The 3DNow! extensions accelerate floating-point-intensive codes, such as those needed by graphics transformations. The authors also describe the implementation of the extensions in both the K6-2 and the K-7 processors.
The tail end of the graphics pipeline is typically a chip that renders triangles and maps textures onto the screen. The Neon graphics accelerator performs these functions and others in a particularly efficient manner. Targeted for workstation graphics, Neon uses a 256-bit-wide interface to its frame buffer (existing PC graphics cards use at most a 128-bit-wide interface). Neon is as large as modern high-performance microprocessors and performs more than 10 billion operations per second.
Another key aspect of a multimedia computer is its audio system. The EMU10K1 digital audio processor provides high-quality multichannel audio on a chip. This processor implements a 64-channel wavetable synthesizer and a 32-channel effects processor and digital audio mixer. It supports applications ranging from a home studio to environmental simulation and 3D positioning in games.
Our final article discusses the Deep Blue supercomputer that defeated World Chess Champion Gary Kasparov in 1997 and uses 480 custom chess chips. Its designer describes the chips' design philosophy, architecture, and performance. By using specialized VLSI circuits, one chess chip is equivalent to 100 billion instructions per second on a general-purpose processor. This gives the entire system the performance of a 50-tera-operation/sec computer!
Due to space limitations, we will place the article on the architecture of Sun Microsystems' UltraSPARC-III in the next issue (May-June) of IEEE Micro. The UltraSPARC-III strives for high performance at both the chip and system levels. For chip performance, the design provides a pipeline that is 50% deeper than previous SPARC processors. The system design aims to support 1,000-processor systems. The UltraSPARC-III will initially be fabricated in a 0.25-micron process.
Since the first symposium in 1989, the chips and products presented at Hot Chips have shown the success of industry and universities in converting advances in integrated circuit processing technologies into raw computing performance. There doesn't seem to be any sign that this will slow down any time soon. We look forward to the next 10 years!
This year's Hot Chips Symposium will take place from 15-17 August at Stanford University. For more information on this year's meeting, please see www.hotchips.org.
We thank those reviewers who helped referee papers, the authors, and all the other people who helped put this issue together.
Norman P. Jouppi is a consulting engineer at Compaq Computer Corporation's Western Research Laboratory in Palo Alto, California. His current research interests include telepresence, multimedia, computer graphics, and computer architecture. In recent years he has worked on the architecture and implementation of advanced graphics accelerators, including Neon. Before this, he was one of the principal architects and implementers of the MIPS microprocessor. Jouppi received his PhD in electrical engineering from Stanford University and an MSEE from Northwestern University. He is a member of the IEEE and ACM. His external home page can be found at http://www.research.digital.com/wrl/people/jouppi/bio.html.
John Wawrzynek is a professor in the Computer Sciences Division of the Department of Electrical Engineering and Computer Science at the University of California, Berkeley. Currently, he is the principal investigator of the BRASS (Berkeley Reconfigurable Architectures, Software, and Systems) project. He also teaches courses in VLSI design and computer architecture. A prior research project involved the development of the first single-chip vector microprocessor. At Berkeley, he has also been involved in the application of analog VLSI technology to auditory sensory processing.Wawrzynek received his PhD from the California Institute of Technology, where he worked under Carver Mead. He holds an MS in electrical engineering from the University of Illinois, Urbana-Champaign. He is a member of the IEEE. His home page can be found at www.cs.berkeley.edu/~johnw.