Issue No. 02 - March/April (2004 vol. 24)
Michael Flynn , Stanford University
Pradeep Dubey , Intel Corp.
Just as technology continues to evolve and provide improved transistor density and performance, so, too, chip architects continue to press forward on chip functionality and cost performance. The most notable trends at the 15th annual Hot Chips conference at Stanford University, held in the summer of 2003, were the increasing emphasis on power and ways of achieving performance. Today, low power and power awareness is an "in" topic. Few Hot Chips 15 presentations ignored the imminent limitations that technology increasingly imposes. As for performance, concurrency is also in vogue, especially in the form of simultaneous multithreading (SMT) and dual-core processors on the same die. Several new designs introduce 2 × 2 chips (two threads on each of two processors) on the same die. This design seems to provide the best performance per square millimeter of silicon, at least for now. The increasing use of large third-level caches is the other notable trend for high-performance designs. These 6- to 36-Mbyte caches are the price of high performance, insulating the processor from the latencies of memory access.
As with Hot Chips 14 in 2002, there is now increasing emphasis on system-on-chip (SoC) implementations. Probably more than half of the 2003 presentations concerned SoCs.
Making Chips Work
As is the custom, the Hot Chips 15 conference opened with two tutorials, both providing fascinating insights into the other side of chip design. The first tutorial was on test and reliability, focusing on techniques for robust system design. Subhasish Mitra of Intel said that as die complexity increases, design quality and reliability become indispensable features, just like power and performance. These features are achievable only with a design methodology that's totally integrated into the logic and system design process. Moreover, as designs become larger, standard design techniques such as scan are about to hit a time-and-cost wall. Using traditional scan takes gigabytes of test data and minutes of time to test complex designs with long scan chains. Mitra's presentation highlighted the need for built-in self-test, soft-error detection, concurrent error detection, and self-repairing systems.
Christof Paar of Ruhr-Universität Bochum gave the second tutorial on the evolving field of cryptographic engineering. The pervasive-computing movement has forced designers to integrate security considerations into all types of processors and controllers. Designers sometime graft this cryptographic functionality onto existing designs without consideration for efficiency and completeness. The various types of cryptographic schemes have implementation considerations for both hardware and software. A very interesting introduction described side-channel attacks, the most important real-world attack against cryptographic schemes within processors.
In this issue of IEEE Micro, the article by Rusu, Muljono, and Cherkauer gives an update on Intel's 64-bit Itanium processor family. The Itanium 2 processor 6M is striking for both its attention to power concerns and its cache hierarchy. Power consumption has become a growing concern for most high-end processors in the last few years. In this context, it is impressive to note that this 1.5-GHz design delivers a 50 percent frequency increase over the previous implementation, while maintaining the same power envelope. Additionally, for those still tracking stats, this design's on-die cache size (6 Mbytes) and overall transistor count (410 million) are the largest ever reported for a shipping microprocessor.
SUN'S TWIN PROCESSORS ON A CHIP
Kapil et al. give a design team's insight into Sun Microsystem's new processor. The article offers a detailed architecture and design overview of a two-core, on-chip multiprocessor called Gemini, aimed at Web server workloads. Sun aims its new throughput computing initiative at increasing the execution efficiency of such throughput-oriented workloads by delivering more work per unit time and per unit of power spent. Compared to earlier UltraSparc II derivative designs, the Gemini processor delivers about twice as much performance per watt of expended power.
TI'S MEDIA PROCESSOR
Deepu Talla et al. give a design team's summary of a SoC, emphasizing the focus that chip companies are putting on consumer electronics. Fixed-function designs of the past are slowly giving way to highly integrated, flexible, and programmable counterparts to improve upgradability and multistandard flexibility, and to lower overall system cost. One article in this issue describes such a design from Texas Instruments—DM310, a portable digital mediaprocessor. The device targets for this design include digital still camera, snapshot photo printers, and video cell phone. The DM310 SoC design includes a digital signal processor core (the C54x) along with an ARM925 system processor core, supported by three coprocessors that target video/imaging functions such as (de)quantization and variable-length encoding and decoding. On an impressive note, this design can achieve real-time MPEG-1 and MPEG-4 video encoding at common intermediate format resolution. It also decodes at VGA resolution using just 400 mW of power.
The article by Kalla, Sinharoy, and Tendler unveils IBM's Power5. The Power5 chip uses two-way SMT on each core of a dual-core chip. This 2 × 2 approach maintains compatibility with Power4 while allowing multiprocessor scalability to 64 physical processors. Power5's SMT design has some novel features. It allows fast, single-threaded execution mode. In this mode, it implements resource balancing to ensure thread fairness in the use of system resources. It also supports a thread priority set by the software. As with many of the new processor designs, the Power5 has dynamic power management that reduces power consumption while maintaining performance.
Uri Cummings, cofounder of Fulcrum Microsystems, presents the company's new switch design in this issue. The article is interesting in several ways. Fulcrum intends the PivotPoint switch for SoC applications to interconnect various processors, memories, and functional units. PivotPoint replaces the main SoC bus with an unusual asynchronous crossbar switch. Because it is asynchronous, the switch can interconnect various units with differing synchronous time domains. Fulcrum calls this type of device a blade-level switch. The article presents a nice analysis of why Fulcrum's design team decided to implement certain features either synchronously or asynchronously.
A THOUGHTFUL DIGRESSION
Those attending Hot Chips 15 had a moment of introspection when Nick Tredennick chaired a session entitled "Disasters I Have Known." The bursting of the technology bubble over the last few years has brought pain to many. A few brave souls came up to the podium and described their own situations. Many of us were (ahem) too bashful. What became apparent is the small margin between huge success and total failure. And that margin frequently has little to do with technology or engineering excellence, and a lot to do with good fortune. It's clear that engineering is not a completely exact science but entrepreneurial engineering is almost a contradiction in terms.
Any conference is only as good as the people who created it. As program cochairs, we'd especially like to acknowledge the committee members for their help in making Hot Chips 15 possible: Forest Baskett, Allen Baum, John Crawford, Keith Diefendorff, Henry Moreton, Tadao Nakamura, Howard Sachs, John Sell, Alan Smith, Marc Tremblay, and John Wawrzynek.
We hope these articles give you some flavor for the exciting work going on in our field. Maybe next year, your project will have a spot in the Hot Chips program; see you there.
Michael Flynn is emeritus professor of electrical engineering at Stanford University. His early work experience was at IBM, where he was a design manager for various early mainframe computers, including the IBM System 360 Model 91. Beginning in 1975 Flynn directed the computer architecture and arithmetic group at Stanford until his retirement in 1999. Flynn has a PhD from Purdue University in electrical engineering. He received the ACM/IEEE Eckert-Mauchley Award, the IEEE CS Harry Goode Memorial Award, and the Tesla Medal from the International Tesla Society (Belgrade). He also has an honorary doctor of science from Trinity College, Ireland, and is a Fellow of the IEEE and the ACM.
Pradeep Dubey is a senior principal engineer and manager of innovative platform architecture in the Corporate Technology Group at Intel. His research interests include computer architectures for new application paradigms in future computing environments. Dubey previously worked at the IBM T.J. Watson Research Center and at Broadcom. He was one of the principal architects for the AltiVec multimedia extension to the PowerPC architecture. He also worked on the design, architecture, and performance issues of various microprocessors, including Intel's 80386, 80486, and Pentium processors. Dubey has a BS in electronics and communication engineering from Birla Institute of Technology, India; an MSEE from the University of Massachusetts at Amherst; and a PhD in electrical engineering from Purdue University. He holds 18 patents and is an IEEE Fellow.