Issue No. 02 - February (2003 vol. 36)
ISSN: 0018-9162
pp: 29-32
Russell Clapp , Fabric7 Systems, Inc.
Ashwini Nanda , IBM T.J. Watson Research Center
Kimberly Keeton , Hewlett-Packard Laboratories

The vast majority of multiprocessor server systems shipped today run commercial workloads. These workloads include classic database applications such as online transaction processing (OLTP) and decision support systems (DSS), as well as newer workloads including Web server, e-mail server, and multitier e-commerce applications. However, much of the research that influenced the design of these servers used scientific and technical workloads, such as the Standard Performance Evaluation Corporation's SPECint and SPECfp benchmarks.
Several factors motivated these choices, such as research funding agencies' priorities, researchers' experience with technical workloads, and the difficulty of working with commercial workloads, including their large hardware requirements, the complexity of tuning their hardware and software, and the lack of access to commercial application source code. This trend has been changing in recent years, however.
The Commercial Difference
With the maturing of commercial workload benchmarks for multiprocessor servers, described in the " Commercial Workload Benchmarks" sidebar, research studies have emerged that show how these workloads' behavior differs significantly from technical and scientific workloads. For example, these studies have generated several points of common wisdom for OLTP workloads. The many concurrent users in these applications lead to higher multiprogramming levels and context switch rates. As a result, OLTP workloads spend a nonnegligible percentage of their execution time in the operating system.
Typically short-lived, OLTP workload transactions access a small fraction of the overall database, resulting in random I/O access patterns that use relatively small disk request sizes. OLTP's cycles per instruction rating runs considerably higher than the SPEC integer benchmarks' CPI. The majority of these cycles consists of instruction- and data-cache miss stalls from OLTP's random data access patterns and nonlooping branch behavior. As a result, these applications are more sensitive to memory latencies.
Researchers have also studied the impact of DSS workloads on server design. Often more complex and longer-running than OLTP queries, DSS queries typically scan large amounts of data, resulting in sequential disk I/O access patterns using large disk request sizes. Due to their sequential I/O patterns and lower multiprogramming levels, DSS workloads typically spend less time in the operating system. DSS queries also tend to have lower CPIs than OLTP, and relatively better cache hit ratios. With lower sensitivity to memory latency and high I/O bandwidth requirements, DSS workloads closely resemble some technical workloads.
Evaluation Techniques
Researchers have used several evaluation techniques to study commercial workloads. Processor and chipset performance counters can measure processor and memory system behavior without slowing down application execution. Performance counters measure only the underlying hardware architecture, without permitting significant exploration of architectural alternatives.
Analytical models
To address this shortcoming, researchers have built analytical and simulation models of the underlying system. Analytically modeling commercial workload behavior presents a significantly greater challenge than modeling scientific and technical workloads. In many cases, scientific and technical workload performance can be computed as the throughput of the floating point units on the system's processors; the operating system accounts for an insignificant fraction of the execution time. In contrast, the combination of nonlooping branch behavior, frequent calls to the OS, and a large degree of multiprogramming makes analytically modeling multiuser commercial server workloads difficult.
Thus, developers base their models for analyzing how well new designs handle commercial workloads on data collected from existing systems that use processor and chipset performance counters. Although this approach has been used with some success, it does not capture the changes in the workload's execution based on the new system's environment.
Full-system simulation
Full-system simulation offers another approach to modeling new designs. Developers created simulators such as SimOS and SimICS to permit simulation of applications, the operating system, and architectural components. These simulators provide more accurate characterization of commercial workloads and enable exploration of architectural alternatives to better support those workloads. However, they can simulate only scaled-down versions of these workloads due to the exorbitant simulation time and space required to run real-life database sizes.
To address this limitation, a few studies have proposed rules of thumb for scaling back OLTP benchmarks to reduce the disk-space requirements. Others have proposed simplified database microbenchmarks to approximate the behavior of OLTP and DSS workloads.
The ongoing evolution of workloads further complicates the continuing challenge in modeling commercial workloads for architectural evaluation. The growth of networking has led to application deployments distributed across heterogeneous systems connected by high-speed networks. These multitier workloads result in systems that have a specialized function in each tier, including a Web server front-end tier, an application server middle tier, and a database server back-end tier. Researchers have only begun to characterize the behavior of these workloads' different components. Recently developed benchmarks that aim to represent their behavior include the TPC-W, SPECweb, SPECjbb, and ECPerf benchmarks.
Issue Overview
In this special issue, we seek to provide computing practitioners with an overview of commercial workload characteristics and how these workloads exercise computer systems. Specifically, the articles in this issue address the characteristics of multitier benchmarks and new approaches to the problem of accurately and cost-effectively modeling complex commercial workloads.
"Benchmarking Internet Servers on Superscalar Machines," by Yue Luo and colleagues, evaluates middle-tier Internet server application behavior on three different microarchitectural platforms, employing built-in processor hardware counters. These authors find that some of the same trends observed for multiuser database back-end workloads also apply to middle-tier applications.
In "TPC-W E-Commerce Benchmark Evaluation," Daniel and Javier García present a detailed characterization of the TPC-W multitier e-commerce benchmark. This analysis includes data on the sensitivity of benchmark performance to different hardware and software configuration options for a given testbed environment. These authors also discuss the appropriateness of the benchmark for representing different e-commerce server-usage models.
"Simulating a $2M Commercial Server on a$2K PC," by Alaa Alameldeen and colleagues, describes techniques for scaling back and tuning large commercial workloads so that they can be simulated on the less-expensive, less-powerful machines available to most researchers. The authors describe the Wisconsin Commercial Workload Suite, comprised of four scaled-down benchmarks that approximate workloads for an OLTP database, Java middleware, and static and dynamic Web servers.
In "Queuing Simulation Model for Multiprocessor Systems," Thin-Fong Tsuei and Wayne Yamamoto present a processor-queuing model that projects the performance characteristics of commercial workloads without requiring the complexity of execution- or trace-driven simulation. Given the workload complexity, they use a hybrid analytical model approach that lets event rates drive the simulation. When compared to traditional analytical models, their approach offers the advantage that it can capture the burstiness in time of different events to more accurately model their effect on performance.
In "Designing Computer Architecture Research Workloads," Lieven Eeckhout and colleagues present a methodology for designing a short-running workload that behaves similarly to a long-running one, through the application of principal-component-analysis techniques. Using this approach, they show how the behavior of reduced or sampled versions of long-running benchmark suites can be validated against the full suite they aim to model. This approach can also be used to compare the detailed execution characteristics of different workloads.
Conclusion
Although much research progress has been made, many questions remain. Despite the importance of commercial workloads, over the past 10 years less than 15 percent of all evaluations in major computer architecture conferences used them. Further, workloads will continue to evolve, and new methodological questions will arise. For instance, what is the best way to fully simulate larger-scale multitier systems? What is the right way to set application configuration parameters and scale data sets?
Looking forward, we expect new developments in computer system instrumentation and simulation environments that will let researchers and architects better evaluate their designs before implementation. We also anticipate continued progress in using commercial workloads to evaluate server designs.
Interest in this special issue grew out of the Workshop on Computer Architecture Evaluation using Commercial Workloads (CAECW; http://iacoma.cs.uiuc.edu/caecw{98,99,00,01} and http://tesla.hpl.hp.com/caecw{02,03}) that we, along with Josep Torrellas at the University of Illinois at Urbana-Champaign, have organized for the past six years. We thank the workshop participants for many stimulating discussions and presentations; the authors of the articles in this special issue for providing such interesting studies to feature; and the anonymous reviewers.
Kimberly Keeton is a research scientist in the Storage Systems Department at Hewlett-Packard Laboratories, where her research focuses on workload characterization, data dependability, and self-managing storage systems. Keeton received a PhD in computer science from the University of California, Berkeley. She is a member of the IEEE, the ACM, and Usenix. Contact her at kkeeton@hpl.hp.com.
Russell Clapp is a principal engineer at Fabric7 Systems. His research interests include systems architecture and performance analysis. Clapp received a PhD in computer science and engineering from the University of Michigan. He is a member of the IEEE and the ACM. Contact him at rmcl@acm.org.
Ashwini Nanda is a research staff member at the IBM T.J. Watson Research Center. His research interests include computer systems architecture, performance, and rich media systems. He currently serves on the Editorial Board of IEEE Transactions on Parallel and Distributed Systems. Contact him at ashwini@us.ibm.com.