Issue No. 05 - September/October (2007 vol. 27)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MM.2007.85
Partha Kundu , Intel
Li-Shiuan Peh , Princeton University
Multicore chips are entering commercial and consumer markets at a ferocious pace. They began in special-purpose, niche markets such as high-performance graphics and networking; examples are the eight-core IBM-Sony-Toshiba Cell processor, the 128-core Nvidia GeForce 8800 GPU, and the 188-core Cisco Silicon Packet Processor in the CRS-1 router. Recently, multicore chips have been introduced into the general-purpose market as well: the Sun T1 with eight cores, for example, and the recently announced Intel Xeon and AMD Barcelona with four cores. Even in the embedded system-on-chip (SoC) domain, ARM's MPCore can be configured for up to four cores.
As the core count scales up, it becomes increasingly apparent that conventional ways of interconnecting these cores, with buses and crossbars, will not work due to tight delay, power, and area budgets. A low-power, high-bandwidth, fast on-chip communication substrate is critically needed to enable scaling to large numbers of cores.
In recent years, on-chip networks have been proposed as the form such a communication substrate might take. According to Wikipedia, an on-chip network "is constructed from multiple point-to-point data links interconnected by switches (a.k.a. routers), such that messages can be relayed from any source module to any destination module over several links, by making routing decisions at the switches."
In this special issue of IEEE Micro, we set out to bring readers the latest advances in the field of on-chip interconnects for multicores. We have specifically focused this special issue on novel on-chip networks realized on actual silicon. Part of our motivation in choosing this focus is to showcase a few silicon prototypes of on-chip networks being used in multicore processors and SoCs; the other part is to bring to attention the implementation issues facing architects and designers.
The first six articles in this special issue gather insights and experiences gained from the design of on-chip interconnects for multicores spanning a fairly diverse spectrum in terms of both target market and architecture: These chips' target domains range from special-purpose processors to general-purpose computing to embedded multiprocessor SoCs. In the architectural context, they range from compositions of heterogeneous blocks of complex macro IPs to a sea of homogeneous units as simple as a functional unit. The next two articles delve into the design infrastructure support for on-chip networks, while our last article summarizes the grand research challenges for realizing next-generation on-chip networks and multicores.
We first present the silicon prototypes that are commercial products already out in the market. The special issue starts off with a look back at one of the first commercial multicore products with a high-performance on-chip interconnect: the IBM-Sony-Toshiba eight-core Cell processor, which has a ring network that interconnects eight processing elements with a coordinating processor. Although the Cell processor architecture has been much publicized, its interconnect architecture has not been discussed and analyzed in depth. This article by Ainsworth and Pinkston provides an incisive analysis of the trade-offs of the Cell interconnect and provides valuable insights about the design choices.
Next, startup Tilera's TILE64 chip with 64 identical cores is a product that just hit the market in August 2007. Wentzlaff et al. describe the detailed microarchitecture of the five mesh networks that interconnect the 64 cores and provide 1.2-Tbps bandwidth to each one. The authors also discuss the interfaces of the network to the processor pipelines and the suite of software tools that aid in mapping programs onto the chip.
The third article, by Butts, describes another startup product with high core count—Ambric's Am2045 chip, which has 360 32-bit RISC processor cores. These are interconnected with a multilevel hierarchical on-chip network that interfaces with processors through registers for fast message passing and synchronization.
Aggressively pushing the research frontiers are various silicon prototypes from research labs and academia. The TRIPS chip from the University of Texas, Austin, demonstrates and prototypes a novel parallel chip architecture with two major on-chip interconnection networks: the 25-node operand network that provides communications between the 16 functional units, and the 40-node cache network that interconnects the cache banks. The article by Gratz et al. explains how these networks are architected to deliver close to dedicated wire latency and presents detailed critical path analysis of the network components.
The Intel Teraflops 80-core processor chip, described by Hoskote et al., is a research prototype specifically intended to push the envelope on many-core interconnect design. It demonstrates a 5-GHz mesh network that delivers a bisection bandwidth of 2.56 Tbps within the 100 W power budget. The high-frequency, high-bandwidth design prompted innovations in clock distribution and synchronization, crossbar design, and power management.
Another interesting prototype from the computing giant Intel is the Scalable Communications Core chip, which showcases Intel's forays into the SoC market. The SCC prototype, described by Arditti et al., demonstrates a reconfigurable, multiprotocol wireless SoC chip. The target domain requires fast time to market, prompting a fully synthesized design with a 3 × 3 mesh interconnecting heterogeneous IP cores.
No discussion of on-chip networks can be complete without addressing the design infrastructure and the impact of process technology on the design tool suite. Pullini et al. discuss the challenges and degrees of freedom available in scaling several working designs from 130-nm to 65-nm technology. Evaluating the trade-offs in the context of large switch design, the authors urge designers to explore technology library variations to find the best fit for their requirements.
Evaluating interconnect architectures for complex multicore chips is becoming increasingly critical. It is not sufficient to architect an on-chip network assuming worst-case traffic patterns. Not only does this lead to inefficient (power-hungry) choices, but it can expend valuable chip resources. As a result, designers must frequently rely on detailed simulation for testing and evaluating their chips, which can be intractable for large many-core designs. Ogras et al. delve into the use of rapid FPGA prototyping as an alternative evaluation platform for NoCs, drawing on experiences with prototyping on-chip networks for four different SoC designs. This article provides a glimpse of the effort required and benefits of FPGA prototyping and demonstrates how research ideas can be validated and the benefits quantified.
Grand challenges: Notes from OCIN
As a sign that on-chip networks are here to play an important role in the next generation of semiconductors, the National Science Foundation initiated a two-day workshop on On-Chip Interconnection Networks (OCIN), held at Stanford University in December 2006. The workshop brought together many academics, industry researchers, and practitioners in the field, with the charter to identify the grand challenges for the field.
Our last article, by Owens et al., aptly concludes this special issue with a summary of OCIN and the three grand challenges charted by the workshop's working groups: power, latency, and CAD compatibility. This article not only serves to highlight the critical need for research into interconnects enabling many-core chips, but will also be a valuable handbook guiding researchers toward solving these grand challenges.
This special issue's focus on actual silicon prototypes demonstrates that on-chip networks are diving into the marketplace very aggressively. These articles show on-chip networks of widely diverse architectures, attesting to the field's infancy. In short, the design space for on-chip interconnects for multicores is wide open—grand challenges await solution on the road to enabling many-core chips.
Partha Kundu is a senior staff researcher at Intel's Microprocessor Technology Labs (MTL) in Santa Clara, California. He was on the architecture team that defined the Intel Itanium ISA and a principal architect on a DEC/Alpha microprocessor. His interests are in solving technology and architecture issues related to realizing large multicore CPUs for next-generation applications and workloads: specifically, on-chip interconnects, power management, memory system enablement, and novel microarchitectures. He has an MS from the State University of New York, Stony Brook.
Li-Shiuan Peh is an assistant professor of electrical engineering at Princeton University. Her research interests include low-power interconnection networks, on-chip networks, and parallel computer architectures. Peh has a PhD in computer science from Stanford University and a BS in computer science from the National University of Singapore. She received the CRA-W Anita Borg Early Career Award in 2007, the Sloan Research Fellowship in 2006, and the NSF CAREER award in 2003.