Issue No. 03 - May/June (2011 vol. 31)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MM.2011.59
Mikko Lipasti , University of Wisconsin-Madison
Natalie Enright Jerger , University of Toronto
Very large-scale computing encompasses a varied range of systems including supercomputers, data centers, cloud computing, and loosely coupled distributed volunteer-based computing. These systems can vary dramatically in terms of microarchitecture, interconnection networks, storage subsystems, and I/O. With such a wide range of systems comes a wide range of applications. Applications range from scientific computing to cloud-based servers augmenting mobile platforms. Very large-scale computing systems' data storage and computational capabilities enable applications that enrich user interaction with digital media as well as yield solutions or provide insight into some of society's grand challenges in areas such as health and medicine.
Very large-scale computing systems encompass a broad range of architectures from simple, low-power cores to application-specific processors and heterogeneous systems. They also vary in the type of programming model used, including message passing, shared memory architectures, and instruction set extensions for accelerators. Despite the wide disparity of system architectures, these systems all face issues of workload scalability, performance, energy consumption, and total cost of ownership (TCO), to name a few. This special issue aims to share with readers various trade-offs these systems make in terms of peak performance, scalability, power delivery, cooling requirements, bandwidth, and reliability. This issue also explores the challenges of simulating and benchmarking such systems.
As guest editors, we're honored to introduce this IEEE Micro special issue on very large-scale computing systems. Submissions were solicited across a broad range of issues and domains, including supercomputing, data centers, and mobile platforms. The articles in this issue represent a small sampling of issues related to very large-scale computing systems. We hope that this issue will generate further discussion and continued interest in this important class of systems.
Large-scale computing applications
The articles in this issue span a wide range of ongoing work in this field, tackling different application domains and exploring a range of possible architectures. The first two articles focus on specific applications that run on very large-scale computing systems. Analysis and understanding of application demands can lead to architectures and systems that significantly improve performance.
In the first article, "Overcoming Communication Latency Barriers in Massively Parallel Scientific Computation," Ron O. Dror et al. examine communication overheads. They focus on architectural and algorithmic approaches to reduce communication latency for molecular dynamics simulation. Anton provides a case study for reducing communication bottlenecks in supercomputing applications. With the increasingly computational power of modern architectures, communication latency will become a more significant overhead in very large-scale computing systems.
"CogniServe: Heterogeneous Server Architecture for Large-Scale Recognition," by Ravi Iyer et al. focuses on a specific class of applications relevant to very large-scale computing systems: image and speech recognition. With the proliferation of smart mobile platforms and their ability to capture large quantities of data and interact with cloud-based servers, architectural optimizations for these recognition workloads are sorely needed. The authors propose a heterogeneous architecture with simple cores and hardware accelerators targeting hot spots in recognition workloads.
Simulation of large-scale computing platforms
The next two articles examine issues related to simulating and evaluating large-scale computing platforms. Given these systems' size and complexity, tractable simulation techniques are needed to evaluate performance and power consumption.
In "Simulating Whole Supercomputer Applications," Juan Gonzalez et al. present a methodology to reduce simulation time for large-scale message-passing interface (MPI) applications. This work leverages MPI tasks' independent nature to extract representative computation regions between communication instances. A small number of these regions can be simulated to drastically reduce simulation time with reasonable accuracy.
In "Automated Full-System Power Characterization," Stijn Polfliet et al. develop a methodology for automated generation of synthetic benchmarks used for characterizing power consumption in multicore servers. In particular, this article considers the contributions of I/O and disks to overall power consumption rather than focusing only on CPU and memory. Energy-related costs contribute substantially to the TCO of very large-scale computing systems; furthermore, the use of multithreaded, multicore architectures and the impact of other system components must be characterized when deploying these systems.
The previous article focuses on a methodology to measure peak power consumption; the final article, "Energy-Aware Accounting and Billing in Large-Scale Computing Facilities" by Víctor Jiménez et al., advocates for accurate accounting of user-specific energy usage. Energy-aware accounting will lead to benefits for both the facility owner and the user. Like the previous article, this work is motivated by energy costs' contribution to TCO and the need for energy-efficient and energy-proportional large-scale computing facilities. The authors outline hardware and software challenges that must be overcome to realize this type of accounting.
This issue represents a snapshot of work going on in this broad field. It's useful to connect microarchitects to those running large-scale applications to provide greater insight and identify further opportunities and demands for innovation. We hope that you will enjoy the selected articles, and we encourage you to provide feedback on this issue.
We are grateful to be able to present such a strong sampling of ongoing work related to very large-scale computing systems. We thank all the authors of submitted articles and the reviewers for providing detailed, constructive comments during the review process.
Natalie Enright Jerger is an assistant professor in the Edward S. Rogers Department of Electrical and Computer Engineering at the University of Toronto. Her research interests include many-core and multicore architectures, on-chip networks, and cache coherence protocols. Enright Jerger has a PhD in electrical engineering from the University of Wisconsin–Madison. She's a member of IEEE, the IEEE Computer Society, and the ACM.
Mikko Lipasti is the Philip Dunham Reed Professor in Electrical and Computer Engineering at the University of Wisconsin–Madison. His research interests include improving performance and power efficiency, reducing bandwidth and complexity, and masking memory and interprocessor communication latencies in conventional computing systems, as well as novel biologically inspired computing systems modeled after the human neocortex. Lipasti received his PhD in electrical and computer engineering from Carnegie Mellon University. He's a senior member of IEEE.