Issue No. 04 - July/August (2007 vol. 27)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MM.2007.71
David H. Albonesi , Cornell University
Like other IEEE magazines, Micro publishes many theme issues, but we also receive excellent general-interest submissions. We would like to receive more of these, since they permit us to highlight important work in the community without requiring a number of submissions on a specific topic, and they also let us mix things up a bit. I know that I sometimes more readily crack open an issue of Micro that covers several different topics rather than a single theme. On that note, I am delighted to introduce the four articles featured in this issue.
In the first article, Assaf Shacham and Keren Bergman describe a very high-performance interconnection network using highly integrated photonic technology. As the authors point out in their sidebar, breakthroughs in the integrated photonics community—both in industry and academia—are coming at a furious pace. The authors' optical network, SPINet, is a very low-latency and high-bandwidth medium for HPC systems. The SPINet architecture was crafted with careful consideration of both the advantages of emerging integrated optical technology and its limitations—such as the high cost of buffering. The authors show that this prudent and clever combination of leading-edge optical technology and network architecture can yield very high performance and good power efficiency with a reasonable area cost.
In the next article, Li Zhao et al. discuss the ManySim simulation framework for future large-scale CMPs (LCMPs) composed of tens of potentially multithreaded cores. ManySim was developed to help address the fundamental architecture questions in LCMP design. As the authors point out, the design space of LCMPs is huge, and capturing every nuance in a single execution-driven simulator running a complex server workload may easily lead to unacceptably slow simulation speeds. The authors take the position that a set of complementary tools can more feasibly attack the LCMP design space. ManySim uses trace-driven simulation—including detailed profiling—and flexible modules to permit rapid first-order LCMP analysis. The authors discuss the ManySim framework at length and demonstrate how it can be used to gain insight into the bandwidth and cache requirements of future LCMPs designed along different dimensions.
Another interesting simulation framework, SimWattch, is presented in the third article, by Jianwei Chen, Michel Dubois, and Per Stenström. SimWattch is a full-system simulator that integrates the Simics functional simulator with the SimpleScalar/Wattch microarchitecture simulators. This combination permits full system simulation—including the Solaris operating system—with detailed out-of-order superscalar microarchitecture performance and power modeling. As the authors point out, there are several important considerations in integrating these two simulators. The SimWattch control interface (SCI) module that they devised exploits Simics' highly optimized memory address path while correctly handling speculation—including load and store instructions along wrong paths—and correctly maps the Simics ISA to the SimpleScalar ISA. A detailed comparison of OS and user code performance and power characteristics for server workloads running on an out-of-order superscalar model demonstrates SimWattch's value.
The last article, by Ivan Gonzalez, Estanislao Aguayo, and Sergio Lopez-Buedo, addresses self-configuring embedded systems. Many FPGAs, such as the Xilinx one used by the authors, include a processor core macro that can be implemented side by side with programmable logic. In an embedded system, the programmable logic can be used to implement application-specific coprocessors, and many such coprocessors may be necessary to handle all the applications, or protocols, that may need to be supported. The authors present the alternative of dynamically creating the coprocessors from the same FPGA area by exploiting the capacity that many FPGAs have for partial reconfiguration at runtime. The article describes the details of the hardware design and the tools that the authors developed to help automate the process. A reconfigurable coprocessor developed for three cryptographic algorithms achieves an impressive area savings over a conventional FPGA implementation that includes all three coprocessors.
I hope that I have whetted your appetite to read the articles in full. I always welcome your feedback at email@example.com.