Astechnology scaling enables the integration of billions of transistors on a chip, economies of scale is prompting the move toward parallel chip architectures with both application-specific systems-on-a-chip (SoC) and general-purpose microprocessors leveraging multiple processing cores on a single chip for better performance at manageable design costs. As these parallel chip architectures scale in size, on-chip networks are emerging as the de facto communication architecture, replacing dedicated interconnects and shared buses. At the same time, tight design constraints in the form of ever-increasing chip-crossing interconnect delays and power consumption are reaching criticality. On-chip networks have to deliver good latency-throughput performance in the face of very tight power and area budgets. The interplay of these two trends makes on-chip network design one of the most challenging and significant design problems system designers are facing in the near term.
New parallel chip architectures bring about unique delay and bandwidth requirements for on-chip networks that, in many ways, are substantially different from traditional multichip/multiboard interconnection networks found in multiprocessors and other "macro" system architectures. The exact requirements depend on the intended application, and need to be met with judicious use of precious silicon real-estate under a tight power budget capped by battery life, power delivery limits, and/or thermal characteristics. What's more, the impact on design and verification effort as well as fault resilience must also be considered. While the computer industry within the past decade has begun introducing on-chip network architectures based on multiple buses (such as ARM's AMBA, IBM's CoreConnect, Sonic's Smart Interconnect IP) and, more recently, point-to-point switch fabrics (such as CrossBow's 2D mesh Xfabric and Fulcrum's crossbar-centric Nexus), no standards have emerged and none are on the horizon. "Which on-chip network architecture best increases chip functionality while not negatively impacting achievable clock frequency, communication latency, bandwidth, flexibility, design/verification effort, and fault resiliency?" remains an open question.
In this special section, we showcase several major research thrusts in the on-chip networks area. The selected papers can be classified by their targeted chip system: application-specific embedded SoCs versus general-purpose microprocessors. In the former, the availability of application knowledge makes it feasible and effective to tailor the on-chip network architecture toward the particular application(s) characteristics. In addition, as design time for embedded SoCs critically impacts time-to-market, streamlining and optimizing the design process of the on-chip networks in such systems for flexibility, reuse, and speed is crucial. On the other hand, for general-purpose microprocessors, the break from traditional single-core architecture opens up the architectural design space, which spawns a different set of requirements (some more relaxed, others more restrictive) for on-chip networks, motivating new network architectures and studies. In both types of systems, on-chip network designers have to grapple with very tight area, power, and wire delay constraints.
The first two papers target application-specific chip systems. "Joint Application Mapping/Interconnect Synthesis Techniques for Embedded Chip-Scale Multiprocessors," by Neal K. Bambha and Shuvra S. Bhattacharyya focuses on a specific phase of the synthesis design flow of on-chip networks for application-specific SoCs: the topology mapping phase. Application knowledge is leveraged here for cooptimization of application mapping, along with topology selection, allowing for irregular topologies. The synthesis algorithm proposed is based on the metric of network hops.
"NoC Synthesis Flow for Customized Domain Specific Multiprocessor Systems-on-Chip," by Davide Bertozzi, Antoine Jalabert, Srinivasan Murali, Rutuparna Tamhankar, Stergios Stergio, Luca Benini, and Giovanni De Micheli proposes a design process that provides a complete synthesis flow of on-chip networks. It starts from application specifications, continues through the mapping of the application onto topologies and selection of a topology, and culminates with the synthesis of router microarchitectures and simulation of the final network design that allows for further design-space exploration. Area and power budgets given by users guide the synthesis process toward delay and reliability targets. The work demonstrates how the application-specific nature of a class of SoCs allows designers to optimize the on-chip network architecture for the application suites. In addition, automating the design process leads to the realization of flexible network architectures that can be parameterized and fine-tuned for a wide range of applications. Both papers demonstrate the effectiveness of tailoring on-chip network design to specific applications through case studies synthesizing a range of embedded SoCs, from DSPs to video processing applications and network processors.
The next two papers explore the impact of new chip multiprocessor architectures on on-chip network design. "On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures" by Joan-Manual Parcerisa, Julio Sahuquillo, Antonio González, and Jose Duato investigates clustered superscalar microarchitectures. These fall into a class of partitioned processor microarchitectures recently adopted by various industry processor chips that are designed to scale beyond single cores, with shared buses as their interconnection fabric. This paper motivates the need for more sophisticated on-chip networks as clustered microarchitectures scale beyond four cores, highlights their unique characteristics, and explores the effect of different network topologies on overall processor performance. The paper also proposes topology- aware instruction steering to further improve chip performance.
"Scalar Operand Networks" by Michael Bedford Taylor, Walter Lee, Saman Amarasinghe, and Anant Agarwal focuses on a particular class of on-chip networks—those that implement bypassing and transport operands between ALUs, which the authors refer to as scalar operand networks. The paper first distinguishes these networks from traditional multiprocessor networks that transport cache lines and other such coarser memory blocks and it goes on to present a taxonomy for classifying the different kinds of scalar operand network architectures. Challenges in the design of these on-chip networks are later highlighted and a 5-tuple model is then proposed for characterizing the delay of such networks. Latency measurements of the implemented MIT Raw prototype chip are used to validate the 5-tuple delay model and to verify its effect on overall processor performance.
We are very grateful to all who have helped to bring about this special section. Ten papers were submitted, of which four were accepted. We thank the authors of all submitted papers for their contribution to this special section and the reviewers for their helpful comments and recommendations. We also acknowledge the excellent job of Ms. Suzanne Werner and Ms. Jennifer Carruth of the IEEE Transactions on Parallel and Distributed Systems manuscript office in helping to manage the entire submission, review, and publication process. Finally, this special section would not have been made possible without the support and encouragement of Dr. Pen-Chung Yew, IEEE Transactions on Parallel and Distributed Systems Editor-in-Chief, who recognized the importance of the on-chip networks area and made the ultimate decision to allow it to be featured in this issue.
Timothy Mark Pinkston
• L.-S. Peh is with the Electrical Engineering Department, Princeton University, B228 Engineering Quadrangle, Princeton, NJ 08544.
• T.M. Pinkston is with the Department of Electrical Engineering Systems, University of Southern California, EEB 208, Hughes Aircraft Electrical Engineering Building, 3740 McClintock Ave., Los Angeles, CA 90089-2562. E-mail: email@example.com.
Published online 20 Dec. 2004.
For information on obtaining reprints of this article, please send e-mail to: firstname.lastname@example.org.
received the BS degree in computer science from the National University of Singapore in 1995 and the PhD degree in computer science from Stanford University in 2001. She has been an assistant professor of electrical engineering at Princeton University since 2002. She is a recipient of the 2003 US National Science Foundation CAREER award and 2004's recipient of Princeton University's School of Engineering and Applied Sciences' E. Lawrence Keys/Emerson Electric Co. Faculty Advancement Award. She has been a program committee member for several conferences (HPCA, SIGMETRICS, Hot Interconnects, ICPP, HiPC, etc.) and workshops (PACS, TACS, SAN, etc.). Her research focuses on power-aware interconnection networks, on-chip networks, and parallel computer architectures and is funded by several grants from the US National Science Foundation, the DARPA MARCO Gigascale Systems Research Center, as well as Intel Corporation. She is a member of the IEEE.
Timothy Mark Pinkston
received the BSEE degree from The Ohio State University in 1985 and the MS and PhD degrees in electrical engineering from Stanford University in 1986 and 1993, respectively. Prior to joining the University of Southern California (USC) in 1993, he was a member of the technical staff at Bell Laboratories, a Hughes Doctoral Fellow at Hughes Research Laboratory, and a visiting researcher at the IBM T.J. Watson Research Laboratory. Currently, Dr. Pinkston is a professor and director of the Computer Engineering Division of the EE-Systems Department at the University of Southern California and he heads the SMART Interconnects Group. His current research interests include the development of deadlock-free adaptive routing techniques and on-chip network and router architectures for achieving high-performance communication in microprocessor and parallel computer systems—scalable parallel processor and cluster computing systems. Dr. Pinkston has authored more than 75 refereed technical papers and has received numerous awards, including the Zumberge Fellow Award, the US National Science Foundation Research Initiation Award, and the US National Science Foundation CAREER Award. Dr. Pinkston is a senior member of the IEEE and a member of the ACM. He has also been a member of the program committee for several major conferences (ISCA, HPCA, ICPP, IPPS/IPDPS, ICDCS, SC, CS&I, CAC, PCRCW, OC, MPPOI, LEOS, WOCS, and WON), the program chair for HiPC '03, the program vice-chair for EuroPar '03 and ICPADS '04, the program cochair for MPPOI '97, the tutorials chair for ISCA '04, the workshops chair for ICPP '01, and the finance chair for Cluster 2001. In addition to serving as the coguest editor for this special section, he has served two 2-year terms as an associate editor for the IEEE Transactions of Parallel and Distributed Systems