Pages: pp. 97-98
Astechnology scaling enables the integration of billions of transistors on a chip, economies of scale is prompting the move toward parallel chip architectures with both application-specific systems-on-a-chip (SoC) and general-purpose microprocessors leveraging multiple processing cores on a single chip for better performance at manageable design costs. As these parallel chip architectures scale in size, on-chip networks are emerging as the de facto communication architecture, replacing dedicated interconnects and shared buses. At the same time, tight design constraints in the form of ever-increasing chip-crossing interconnect delays and power consumption are reaching criticality. On-chip networks have to deliver good latency-throughput performance in the face of very tight power and area budgets. The interplay of these two trends makes on-chip network design one of the most challenging and significant design problems system designers are facing in the near term.
New parallel chip architectures bring about unique delay and bandwidth requirements for on-chip networks that, in many ways, are substantially different from traditional multichip/multiboard interconnection networks found in multiprocessors and other "macro" system architectures. The exact requirements depend on the intended application, and need to be met with judicious use of precious silicon real-estate under a tight power budget capped by battery life, power delivery limits, and/or thermal characteristics. What's more, the impact on design and verification effort as well as fault resilience must also be considered. While the computer industry within the past decade has begun introducing on-chip network architectures based on multiple buses (such as ARM's AMBA, IBM's CoreConnect, Sonic's Smart Interconnect IP) and, more recently, point-to-point switch fabrics (such as CrossBow's 2D mesh Xfabric and Fulcrum's crossbar-centric Nexus), no standards have emerged and none are on the horizon. "Which on-chip network architecture best increases chip functionality while not negatively impacting achievable clock frequency, communication latency, bandwidth, flexibility, design/verification effort, and fault resiliency?" remains an open question.
In this special section, we showcase several major research thrusts in the on-chip networks area. The selected papers can be classified by their targeted chip system: application-specific embedded SoCs versus general-purpose microprocessors. In the former, the availability of application knowledge makes it feasible and effective to tailor the on-chip network architecture toward the particular application(s) characteristics. In addition, as design time for embedded SoCs critically impacts time-to-market, streamlining and optimizing the design process of the on-chip networks in such systems for flexibility, reuse, and speed is crucial. On the other hand, for general-purpose microprocessors, the break from traditional single-core architecture opens up the architectural design space, which spawns a different set of requirements (some more relaxed, others more restrictive) for on-chip networks, motivating new network architectures and studies. In both types of systems, on-chip network designers have to grapple with very tight area, power, and wire delay constraints.
The first two papers target application-specific chip systems. "Joint Application Mapping/Interconnect Synthesis Techniques for Embedded Chip-Scale Multiprocessors," by Neal K. Bambha and Shuvra S. Bhattacharyya focuses on a specific phase of the synthesis design flow of on-chip networks for application-specific SoCs: the topology mapping phase. Application knowledge is leveraged here for cooptimization of application mapping, along with topology selection, allowing for irregular topologies. The synthesis algorithm proposed is based on the metric of network hops.
"NoC Synthesis Flow for Customized Domain Specific Multiprocessor Systems-on-Chip," by Davide Bertozzi, Antoine Jalabert, Srinivasan Murali, Rutuparna Tamhankar, Stergios Stergio, Luca Benini, and Giovanni De Micheli proposes a design process that provides a complete synthesis flow of on-chip networks. It starts from application specifications, continues through the mapping of the application onto topologies and selection of a topology, and culminates with the synthesis of router microarchitectures and simulation of the final network design that allows for further design-space exploration. Area and power budgets given by users guide the synthesis process toward delay and reliability targets. The work demonstrates how the application-specific nature of a class of SoCs allows designers to optimize the on-chip network architecture for the application suites. In addition, automating the design process leads to the realization of flexible network architectures that can be parameterized and fine-tuned for a wide range of applications. Both papers demonstrate the effectiveness of tailoring on-chip network design to specific applications through case studies synthesizing a range of embedded SoCs, from DSPs to video processing applications and network processors.
The next two papers explore the impact of new chip multiprocessor architectures on on-chip network design. "On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures" by Joan-Manual Parcerisa, Julio Sahuquillo, Antonio González, and Jose Duato investigates clustered superscalar microarchitectures. These fall into a class of partitioned processor microarchitectures recently adopted by various industry processor chips that are designed to scale beyond single cores, with shared buses as their interconnection fabric. This paper motivates the need for more sophisticated on-chip networks as clustered microarchitectures scale beyond four cores, highlights their unique characteristics, and explores the effect of different network topologies on overall processor performance. The paper also proposes topology- aware instruction steering to further improve chip performance.
"Scalar Operand Networks" by Michael Bedford Taylor, Walter Lee, Saman Amarasinghe, and Anant Agarwal focuses on a particular class of on-chip networks—those that implement bypassing and transport operands between ALUs, which the authors refer to as scalar operand networks. The paper first distinguishes these networks from traditional multiprocessor networks that transport cache lines and other such coarser memory blocks and it goes on to present a taxonomy for classifying the different kinds of scalar operand network architectures. Challenges in the design of these on-chip networks are later highlighted and a 5-tuple model is then proposed for characterizing the delay of such networks. Latency measurements of the implemented MIT Raw prototype chip are used to validate the 5-tuple delay model and to verify its effect on overall processor performance.
We are very grateful to all who have helped to bring about this special section. Ten papers were submitted, of which four were accepted. We thank the authors of all submitted papers for their contribution to this special section and the reviewers for their helpful comments and recommendations. We also acknowledge the excellent job of Ms. Suzanne Werner and Ms. Jennifer Carruth of the IEEE Transactions on Parallel and Distributed Systems manuscript office in helping to manage the entire submission, review, and publication process. Finally, this special section would not have been made possible without the support and encouragement of Dr. Pen-Chung Yew, IEEE Transactions on Parallel and Distributed Systems Editor-in-Chief, who recognized the importance of the on-chip networks area and made the ultimate decision to allow it to be featured in this issue.
Timothy Mark Pinkston