, Stanford University
, New Enterprise Associates
Pages: pp. 14-15
Chips are hotter than ever! Presented at the 11th Hot Chips Symposium in August 1999 at Stanford University were a good number of chips that can execute many billions of operations per second. These are not expensive supercomputers used only in national laboratories and large companies. They are special-purpose chips used in communication and entertainment systems that are even more affordable and pervasive than PCs.
Hot Chips continues to be the symposium that engineers attend to learn about the technical details of cutting-edge commercial processor chips and system designs. Packed with keynote speeches, technical papers, tutorials, and panels, the conference last year attracted 850 attendees. The largest computer companies and the newest start-ups presented papers on a range of chips from general-purpose processors to highly specialized devices. The industry has moved so fast that two of the start-ups that submitted abstracts were already acquired by the time they presented!
This special issue is a collection of the best and most innovative works presented at the conference. Together, these six articles provide an excellent illustration of the classical trade-off between generality and programmability on the one hand and hardware cost-effectiveness on the other.
"The Broadband Revolution" is a condensed transcript of the keynote address by Henry Samueli, cochair and chief technical officer at Broadcom Corporation. Samueli gives a crisp overview of the underlying IC technology that will increase the communication bandwidth to the average home by three orders of magnitude, from tens of kilobits per second to tens of Mbps. He presents the architecture and the implementation of state-of-the-art broadband ICs for five major applications: satellite receivers, DSL, digital cable TV and cable modems, home networking, and gigabit Ethernet. He gives you a good appreciation of the complexity in existing systems and the challenges to come in building future ASICs with hundreds of millions of transistors.
Frank and Holloway, also from Broadcom, delve into the specific details of home networking. Home networking, used currently only to connect multiple computers, will become more important as network appliances become pervasive. This article focuses on the approach of using existing wires of POTS (plain old telephone service) as a low-cost interconnect solution. Standardized under the auspices of the Home Phoneline Networking Association (HPNA), this technology is in its second generation, operating at speeds up to 16 Mbps. Phone line networking, unlike traditional Ethernet, must work robustly over a widely disparate range of transmission channels that have significant dynamic impairments. This article covers the system requirements, the PHY and MAC layers of the HPNA 2.0 standard, and the implementation of Broadcom's iLine-10 chip set.
Another important application that can exploit arbitrarily high computation bandwidth is video games. The next-generation Sony PlayStation has a new processor known as the Emotion Engine. It is designed to support 3D graphics and emotion synthesis, a term coined to mean real-time synthesis of animation scenes that can arouse a viewer's emotions. A live demonstration of the system at Hot Chips treated the audience to a taste of this emotion synthesis and the new generation of video games to come.
The Emotion Engine achieves an impressive peak computational bandwidth of 5.5 Gflops by using two different "vector units" that employ a combination of VLIW and SIMD techniques. Kunimatsu and colleagues show that the processor at 300 MHz can execute important graphics primitives more than twice as fast as a 600-MHz Pentium III SSE. Such performance differential is necessary to justify the existence of dedicated game consoles.
Targeting media processing as a whole, the Equator MAP1000A processor is also designed to deliver a very high computational bandwidth, but with a greater emphasis on programmability and generality. Like the Emotion Engine, the processor also uses a combination of VLIW and SIMD techniques. It has support for a variety of data types, from 8-bit and 16-bit fixed-point data to 32-bit floating-point numbers. To alleviate the memory bottleneck, the chip includes a novel data-streaming unit that can autonomously bring data into the multiported data cache while the processor continues to compute with data already in the cache. The article by Basoglu and associates presents not just the processor architecture but also the compiler technology because a good compiler is critical to the acceptance of a media processor. The Equator processor offers an interesting programmable alternative to hardwired implementations for media processing.
Gonzalez from Tensilica presents a new processor design methodology tailored to the ASIC design flow. Instead of a fixed processor architecture carefully crafted and tuned for a specific fabrication process, the Xtensa processor is a synthesizable core that an ASIC designer can configure, extend, and customize for a specific application. By adding specialized hardware functions to the processor, one can arrive at a cost-effective design that combines the programmability of processors with the efficiency of hardwired functional units. The processor extension is performed through a high-level specification interface, which shields the users from the low-level details that make this process difficult and error prone.
Changing the instruction set architecture, or ISA, means that the entire programming tool chain, such as the compiler and the instruction set simulator, must also change accordingly. All the programming tools are also automatically generated from the same high-level specification of the extension. The ISA is an important abstraction; by empowering the designers with the ability to define new ISAs easily, this approach may become a key component of the toolbox we use to combat the complexity of future systems on a chip.
Hammond and associates from Stanford focus on improving the performance of general-purpose processors. Specifically, the authors examine the use of speculative thread-level parallelism in the context of CMP (chip multiprocessing). Their proposed system automatically turns a sequential program into multiple threads, which are then executed in parallel speculatively. The hardware keeps track of the dependencies between these threads, discards the side effects of a thread whenever the parallel execution is found to violate the dependencies, and restarts the threads. Providing a source of parallelism that complements instruction-level parallelism, speculative threads can potentially greatly improve the performance of general microprocessors.