Search For:

Displaying 1-50 out of 83 total
Automated Full-System Power Characterization
Found in: IEEE Micro
By Stijn Polfliet, Frederick Ryckbosch, Lieven Eeckhout
Issue Date:May 2011
pp. 46-59
<p>A new framework automatically generates full-system multicore powermarks, or synthetic programs with desired power characteristics on multicore server platforms. The framework constructs full-system power models with error bounds on the power esti...
 
Fast, Accurate, and Validated Full-System Software Simulation of x86 Hardware
Found in: IEEE Micro
By Frederick Ryckbosch, Stijn Polfliet, Lieven Eeckhout
Issue Date:November 2010
pp. 46-56
<p>This article presents a fast and accurate interval-based CPU timing model that is easily implemented and integrated in the COTSon full-system simulation infrastructure. Validation against real x86 hardware demonstrates the timing model's accuracy....
 
Workload Reduction and Generation Techniques
Found in: IEEE Micro
By Luk Van Ertvelde, Lieven Eeckhout
Issue Date:November 2010
pp. 57-65
<p>Benchmarking is a fundamental aspect of computer system design. Recently proposed workload reduction and generation techniques include input reduction, sampling, code mutation, and benchmark synthesis. The authors discuss and compare these techniq...
 
Per-Thread Cycle Accounting
Found in: IEEE Micro
By Stijn Eyerman, Lieven Eeckhout
Issue Date:January 2010
pp. 71-80
<p>Resource sharing unpredictably affects per-thread performance in multithreaded architectures, but system software assumes all coexecuting threads make equal progress. Per-thread cycle accounting addresses this problem by tracking per-thread progre...
 
A Methodology for Analyzing Commercial Processor Performance Numbers
Found in: Computer
By Kenneth Hoste, Lieven Eeckhout
Issue Date:October 2009
pp. 70-76
The wealth of performance numbers provided by benchmarking corporations makes it difficult to detect trends across commercial machines. A proposed methodology, based on statistical data analysis, simplifies exploration of these machines' large datasets.
 
Studying Compiler-Microarchitecture Interactions through Interval Analysis
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Stijn Eyerman, Lieven Eeckhout, James E. Smith
Issue Date:September 2007
pp. 406
In modern processors, both the hardware implementation and optimizing compilers are very complex, and they often interact in unpredictable ways. A high performance microarchitecture typically issues instructions out-of-order and must deal with a number of ...
   
Microarchitecture-Independent Workload Characterization
Found in: IEEE Micro
By Kenneth Hoste, Lieven Eeckhout
Issue Date:May 2007
pp. 63-72
For computer designers, understanding the characteristics of workloads running on current and future computer systems is of utmost importance during microprocessor design. A microarchitecture-independent method ensures an accurate characterization of inher...
 
A Top-Down Approach to Architecting CPI Component Performance Counters
Found in: IEEE Micro
By Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, James E. Smith
Issue Date:January 2007
pp. 84-93
Software developers can gain insight into software-hardware interactions by decomposing processor performance into individual cycles-per-instruction components that differentiate cycles consumed in active computation from those spent handling various miss ...
 
Adaptive Prefetching for Multimedia Applications in Embedded Systems
Found in: Design, Automation and Test in Europe Conference and Exhibition
By Hassan Sbeyti, Smail Niar, Lieven Eeckhout
Issue Date:February 2004
pp. 21350
This paper presents a new and simple prefetching mechanism to improve the memory performance of multimedia applications. This method adapts the memory access mechanism to the access patterns as observed in the application. By doing so, performance is incre...
   
SWAP: Parallelization through Algorithm Substitution
Found in: IEEE Micro
By Hengjie Li,Wenting He,Yang Chen,Lieven Eeckhout,Olivier Temam,Chengyong Wu
Issue Date:July 2012
pp. 54-67
By explicitly indicating which algorithms they use and encapsulating these algorithms within software components, programmers make it possible for an algorithm-aware compiler to replace their original algorithm implementations with compatible parallel impl...
 
A mechanistic performance model for superscalar in-order processors
Found in: Performance Analysis of Systems and Software, IEEE International Symmposium on
By Maximilien Breughe,Stijn Eyerman,Lieven Eeckhout
Issue Date:April 2012
pp. 14-24
Mechanistic processor performance modeling builds an analytical model from understanding the underlying mechanisms in the processor and provides fundamental insight in program-microarchitecture interactions, as well as microarchitecture structure scaling t...
 
Speedup stacks: Identifying scaling bottlenecks in multi-threaded applications
Found in: Performance Analysis of Systems and Software, IEEE International Symmposium on
By Stijn Eyerman,Kristof Du Bois,Lieven Eeckhout
Issue Date:April 2012
pp. 145-155
Multi-threaded workloads typically show sublinear speedup on multi-core hardware, i.e., the achieved speedup is not proportional to the number of cores and threads. Sublinear scaling may have multiple causes, such as poorly scalable synchronization leading...
 
The Multi-Program Performance Model: Debunking current practice in multi-core simulation
Found in: IEEE Workload Characterization Symposium
By Kenzo Van Craeynest,Lieven Eeckhout
Issue Date:November 2011
pp. 26-37
Composing a representative multi-program multi-core workload is non-trivial. A multi-core processor can execute multiple independent programs concurrently, and hence, any program mix can form a potential multi-program workload. Given the very large number ...
 
Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads
Found in: IEEE Workload Characterization Symposium
By Wim Heirman,Trevor E. Carlson,Shuai Che,Kevin Skadron,Lieven Eeckhout
Issue Date:November 2011
pp. 38-49
This paper proposes a methodology for analyzing parallel performance by building cycle stacks. A cycle stack quantifies where the cycles have gone, and provides hints towards optimization opportunities. We make the case that this is particularly interestin...
 
Ranking commercial machines through data transposition
Found in: IEEE Workload Characterization Symposium
By Beau Piccart,Andy Georges,Hendrik Blockeel,Lieven Eeckhout
Issue Date:November 2011
pp. 3-14
The performance numbers reported by benchmarking consortia and corporations provide little or no insight into the performance of applications of interest that are not part of the benchmark suite. This paper describes data transposition, a novel methodology...
 
Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation
Found in: SC Conference
By Trevor E. Carlson,Wim Heirman,Lieven Eeckhout
Issue Date:November 2011
pp. 1-12
Two major trends in high-performance computing, namely, larger numbers of cores and the growing size of on-chip cache memory, are creating significant challenges for evaluating the design space of future processor architectures. Fast and scalable simulatio...
 
How sensitive is processor customization to the workload's input datasets?
Found in: Application Specific Processors, Symposium on
By Maximilien Breughe,Zheng Li,Yang Chen,Stijn Eyerman,Olivier Temam,Chengyong Wu,Lieven Eeckhout
Issue Date:June 2011
pp. 1-7
Hardware customization is an effective approach for meeting application performance requirements while achieving high levels of energy efficiency. Application-specific processors achieve high performance at low energy by tailoring their designs towards a s...
 
Mechanistic-empirical processor performance modeling for constructing CPI stacks on real hardware
Found in: Performance Analysis of Systems and Software, IEEE International Symmposium on
By Stijn Eyerman, Kenneth Hoste, Lieven Eeckhout
Issue Date:April 2011
pp. 216-226
Analytical processor performance modeling has received increased interest over the past few years. There are basically two approaches to constructing an analytical model: mechanistic modeling and empirical modeling. Mechanistic modeling builds up an analyt...
 
Trends in Server Energy Proportionality
Found in: Computer
By Frederick Ryckbosch,Stijn Polfliet,Lieven Eeckhout
Issue Date:September 2011
pp. 69-72
Server energy proportionality, as quantified by the proposed EP metric, has improved significantly, from 30-40 percent in 2007 to 50-80 percent today, but much more can be done to move systems closer to ideal.
 
AVF Stressmark: Towards an Automated Methodology for Bounding the Worst-Case Vulnerability to Soft Errors
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Arun Arvind Nair, Lizy Kurian John, Lieven Eeckhout
Issue Date:December 2010
pp. 125-136
Soft error reliability is increasingly becoming a first-order design concern for microprocessors, as a result of higher transistor counts, shrinking device geometries and lowering of operating voltages. It is important for designers to be able to validate ...
 
Scenario-Based Resource Prediction for QoS-Aware Media Processing
Found in: Computer
By Juan Hamers, Lieven Eeckhout
Issue Date:October 2010
pp. 56-63
Media streams can be annotated with platform-independent scenario information to reflect frame-level decode complexity. This enables energy-efficient decoding, resource prediction, and quality-of-service management on single-core as well as multicore proce...
 
A Counter Architecture for Online DVFS Profitability Estimation
Found in: IEEE Transactions on Computers
By Stijn Eyerman, Lieven Eeckhout
Issue Date:November 2010
pp. 1576-1583
Dynamic voltage and frequency scaling (DVFS) is a well known and effective technique for reducing power consumption in modern microprocessors. An important concern though is to estimate its profitability in terms of performance and energy. Current DVFS pro...
 
Chip Multiprocessor Design Space Exploration through Statistical Simulation
Found in: IEEE Transactions on Computers
By Davy Genbrugge, Lieven Eeckhout
Issue Date:December 2009
pp. 1668-1681
Developing fast chip multiprocessor simulation techniques is a challenging problem. Solving this problem is especially valuable for design space exploration purposes during the early stages of the design cycle where a large number of design points need to ...
 
System-Level Performance Metrics for Multiprogram Workloads
Found in: IEEE Micro
By Stijn Eyerman, Lieven Eeckhout
Issue Date:May 2008
pp. 42-53
Assessing the performance of multiprogram workloads running on multithreaded hardware is difficult because it involves a balance between single-program performance and overall system performance. This article argues for developing multiprogram performance ...
 
Characterizing the Unique and Diverse Behaviors in Existing and Emerging General-Purpose and Domain-Specific Benchmark Suites
Found in: Performance Analysis of Systems and Software, IEEE International Symmposium on
By Kenneth Hoste, Lieven Eeckhout
Issue Date:April 2008
pp. 157-168
Characterizing and understanding emerging workload behavior is of vital importance to ensure next generation microprocessors perform well on their anticipated future workloads. This paper compares a number of benchmark suites from emerging application doma...
 
Representative Multiprogram Workloads for Multithreaded Processor Simulation
Found in: IEEE Workload Characterization Symposium
By Michael Van Biesbrouck, Lieven Eeckhout, Brad Calder
Issue Date:September 2007
pp. 193-203
Almost all new consumer-grade processors are capable of executing multiple programs simultaneously. The analysis of multiprogrammed workloads for multicore and SMT processors is challenging and time-consuming because there are many possible combinations of...
 
Exploring the Application Behavior Space Using Parameterized Synthetic Benchmarks
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Ajay M. Joshi, Lieven Eeckhout, Lizy K. John
Issue Date:September 2007
pp. 412
Computer architects and researchers face several challenges when using benchmarks in industry product development and academic research, namely: (1) Benchmarks only represent a sample of the application behavior space, (2) Benchmarks are rigid and measure ...
   
Memory Data Flow Modeling in Statistical Simulation for the Efficient Exploration of Microprocessor Design Spaces
Found in: IEEE Transactions on Computers
By Davy Genbrugge, Lieven Eeckhout
Issue Date:January 2008
pp. 41-54
Microprocessor design is both complex and time-consuming: exploring a huge design space for identifying the optimal design under a number of constraints is infeasible using detailed architectural simulation of entire benchmark executions. Statistical simul...
 
A Memory-Level Parallelism Aware Fetch Policy for SMT Processors
Found in: High-Performance Computer Architecture, International Symposium on
By Stijn Everman, Lieven Eeckhout
Issue Date:February 2007
pp. 240-249
A thread executing on a simultaneous multithreading (SMT) processor that experiences a long-latency load will eventually stall while holding execution resources. Existing long-latency load aware SMT fetch policies limit the amount of resources allocated by...
 
Efficient Sampling Startup for SimPoint
Found in: IEEE Micro
By Michael Van Biesbrouck, Brad Calder, Lieven Eeckhout
Issue Date:July 2006
pp. 32-42
Sampling techniques dramatically shorten simulation times for industry-standard benchmarks, but establishing the correct architecture and microarchitecture states at the beginning of each sample can be time-consuming. This article compares the accuracy and...
 
Measuring Benchmark Similarity Using Inherent Program Characteristics
Found in: IEEE Transactions on Computers
By Ajay Joshi, Aashish Phansalkar, Lieven Eeckhout, Lizy Kurian John
Issue Date:June 2006
pp. 769-782
This paper proposes a methodology for measuring the similarity between programs based on their inherent microarchitecture-independent characteristics, and demonstrates two applications for it: 1) finding a representative subset of programs from benchmark s...
 
NSL-BLRL: Efficient CacheWarmup for Sampled Processor Simulation
Found in: Simulation Symposium, Annual
By Luk Van Ertvelde, Filip Hellebaut, Lieven Eeckhout, Koen De Bosschere
Issue Date:April 2006
pp. 168-177
<p>Architectural simulation is extremely time-consuming given the huge number of instructions that need to be simulated for contemporary benchmarks. Sampled simulation which selects a number of samples from the complete benchmark execution yields sub...
 
Space-Efficient 64-bit Java Objects through Selective Typed Virtual Addressing
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Kris Venstermans, Lieven Eeckhout, Koen De Bosschere
Issue Date:March 2006
pp. 76-86
Memory performance is an important design issue for contemporary systems given the ever increasing memory gap. This paper proposes a space-efficient Java object model for reducing the memory consumption of 64-bit Java virtual machines. We propose Selective...
 
Self-Monitored Adaptive Cache Warm-Up for Microprocessor Simulation
Found in: Computer Architecture and High Performance Computing, Symposium on
By Yue Luo, Lizy K. John, Lieven Eeckhout
Issue Date:October 2004
pp. 10-17
Simulation is the most important tool for computer architects to evaluate the performance of new computer designs. However, detailed simulation is extremely time consuming. Sampling is one of the techniques that effectively reduce simulation time. In order...
 
Control Flow Modeling in Statistical Simulation for Accurate and Efficient Processor Design Studies
Found in: Computer Architecture, International Symposium on
By Lieven Eeckhout, Robert H. Bell Jr., Bastiaan Stougie, Koen De Bosschere, Lizy K. John
Issue Date:June 2004
pp. 350
Designing a new microprocessor is extremely time-consuming. One of the contributing reasons is that computer designers rely heavily on detailed architectural simulations, which are very time-consuming. Recent work has focused on statistical simulation to a...
 
Efficient Microprocessor Design Space Exploration through Statistical Simulation
Found in: Simulation Symposium, Annual
By Lieven Eeckhout, Dirk Stroobandt, Koen De Bosschere
Issue Date:April 2003
pp. 233
To cope with the widening design gap, the ever increasing impact of technology, reflected in increased interconnect delay and power consumption, and the time-consuming simulations needed to define the architecture of a microprocessor, computer engineers ne...
 
Designing Computer Architecture Research Workloads
Found in: Computer
By Lieven Eeckhout, Hans Vandierendonck, Koen De Bosschere
Issue Date:February 2003
pp. 65-71
<p>Although architectural simulators model microarchitectures at a high abstraction level, the increasing complexity of both the microarchitectures themselves and the applications that run on them make simulator use extremely time-consuming. Simulato...
 
Workload Design: Selecting Representative Program-Input Pairs
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Lieven Eeckhout, Hans Vandierendonck, Koen De Bosschere
Issue Date:September 2002
pp. 83
Having a representative workload of the target domain of a microprocessor is extremely important throughout its design. The composition of a workload involves two issues: (i) which benchmarks to select and (ii) which input data sets to select per benchmark...
 
Hybrid Analytical-Statistical Modeling for Efficiently Exploring Architecture and Workload Design Spaces
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Lieven Eeckhout, Koen de Bosschere
Issue Date:September 2001
pp. 0025
Abstract: Microprocessor design time and effort are getting impractical due to the huge number of simulations that need to be done to evaluate various processor configurations for various workloads. An early design stage methodology could be useful to effi...
 
On the Feasibility of Fixed-Length Block Structured Architectures
Found in: Australasian Computer Architecture Conference
By Lieven Eeckhout, Koen de Bosschere, Henk Neefs
Issue Date:February 2000
pp. 17
Scaling contemporary superscalar microarchitectures to higher levels of parallelism in future technologies seems to be impractical due to the increasing complexity. In this paper, we show that a fixed-length block structured instruction set architecture (B...
 
BarrierPoint: Sampled simulation of multi-threaded applications
Found in: 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
By Trevor E. Carlson,Wim Heirman,Kenzo Van Craeynest,Lieven Eeckhout
Issue Date:March 2014
pp. 2-12
Sampling is a well-known technique to speed up architectural simulation of long-running workloads while maintaining accurate performance predictions. A number of sampling techniques have recently been developed that extend well-known single-threaded techni...
   
Undersubscribed threading on clustered cache architectures
Found in: 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)
By Wim Heirman,Trevor E. Carlson,Kenzo Van Craeynest,Ibrahim Hur,Aamer Jaleel,Lieven Eeckhout
Issue Date:February 2014
pp. 678-689
Recent many-core processors such as Intel's Xeon Phi and GPGPUs specialize in running highly scalable parallel applications at high performance while simultaneously embracing energy efficiency as a first-order design constraint. The traditional belief is t...
   
Fairness-aware scheduling on single-ISA heterogeneous multi-cores
Found in: 2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)
By Kenzo Van Craeynest,Shoaib Akram,Wim Heirman,Aamer Jaleel,Lieven Eeckhout
Issue Date:September 2013
pp. 177-187
Single-ISA heterogeneous multi-cores consisting of small (e.g., in-order) and big (e.g., out-of-order) cores dramatically improve energy- and power-efficiency by scheduling workloads on the most appropriate core type. A significant body of recent work has ...
   
Restating the Case for Weighted-IPC Metrics to Evaluate Multiprogram Workload Performance
Found in: IEEE Computer Architecture Letters
By Stijn Eyerman,Lieven Eeckhout
Issue Date:May 2013
pp. 1
Weighted speedup is nowadays the most commonly used multiprogram workload performance metric. Weighted speedup is a weighted-IPC metric, i.e., the multiprogram IPC of each program is first weighted with its isolated IPC. Recently, Michaud questions the val...
 
Sampled simulation of multi-threaded applications
Found in: 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
By Trevor E. Carlson,Wim Heirman,Lieven Eeckhout
Issue Date:April 2013
pp. 2-12
Sampling is a well-known workload reduction technique that allows one to speed up architectural simulation while accurately predicting performance. Previous sampling methods have been shown to accurately predict single-threaded application runtime based on...
   
The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism
Found in: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems (ASPLOS '14)
By Lieven Eeckhout, Stijn Eyerman
Issue Date:March 2014
pp. 591-606
The number of active threads in a multi-core processor varies over time and is often much smaller than the number of supported hardware threads. This requires multi-core chip designs to balance core count and per-core performance. Low active thread counts ...
     
PCantorSim: Accelerating parallel architecture simulation through fractal-based sampling
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Chengzhong Xu, Chuntao Jiang, Hai Jin, Lieven Eeckhout, Trevor E. Carlson, Wim Heirman, Xiaofei Liao, Zhibin Yu
Issue Date:December 2013
pp. 1-24
Computer architects rely heavily on microarchitecture simulation to evaluate design alternatives. Unfortunately, cycle-accurate simulation is extremely slow, being at least 4 to 6 orders of magnitude slower than real hardware. This longstanding problem is ...
     
Accelerating an application domain with specialized functional units
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Carlos Álvarez, Cecilia González-Álvarez, Daniel Jiménez-González, Jennifer B. Sartor, Lieven Eeckhout
Issue Date:December 2013
pp. 1-25
Hardware specialization has received renewed interest recently as chips are hitting power limits. Chip designers of traditional processor architectures have primarily focused on general-purpose computing, partially due to time-to-market pressure and simple...
     
Selecting representative benchmark inputs for exploring microprocessor design spaces
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Lieven Eeckhout, Maximilien B. Breughe
Issue Date:December 2013
pp. 1-24
The design process of a microprocessor requires representative workloads to steer the search process toward an optimum design point for the target application domain. However, considering a broad set of workloads to cover the large space of potential workl...
     
Bottle graphs: visualizing scalability bottlenecks in multi-threaded applications
Found in: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications (OOPSLA '13)
By Lieven Eeckhout, Stijn Eyerman, Jennifer B. Sartor, Kristof Du Bois
Issue Date:October 2013
pp. 355-372
Understanding and analyzing multi-threaded program performance and scalability is far from trivial, which severely complicates parallel software development and optimization. In this paper, we present bottle graphs, a powerful analysis tool that visualizes...
     
 1  2 Next >>