Search For:

Displaying 1-23 out of 23 total
Studying Compiler-Microarchitecture Interactions through Interval Analysis
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Stijn Eyerman, Lieven Eeckhout, James E. Smith
Issue Date:September 2007
pp. 406
In modern processors, both the hardware implementation and optimizing compilers are very complex, and they often interact in unpredictable ways. A high performance microarchitecture typically issues instructions out-of-order and must deal with a number of ...
   
Memory-level parallelism aware fetch policies for simultaneous multithreading processors
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Lieven Eeckhout, Stijn Eyerman, Stijn Eyerman
Issue Date:March 2009
pp. 1-33
A thread executing on a simultaneous multithreading (SMT) processor that experiences a long-latency load will eventually stall while holding execution resources. Existing long-latency load aware SMT fetch policies limit the amount of resources allocated by...
     
A mechanistic performance model for superscalar in-order processors
Found in: Performance Analysis of Systems and Software, IEEE International Symmposium on
By Maximilien Breughe,Stijn Eyerman,Lieven Eeckhout
Issue Date:April 2012
pp. 14-24
Mechanistic processor performance modeling builds an analytical model from understanding the underlying mechanisms in the processor and provides fundamental insight in program-microarchitecture interactions, as well as microarchitecture structure scaling t...
 
Speedup stacks: Identifying scaling bottlenecks in multi-threaded applications
Found in: Performance Analysis of Systems and Software, IEEE International Symmposium on
By Stijn Eyerman,Kristof Du Bois,Lieven Eeckhout
Issue Date:April 2012
pp. 145-155
Multi-threaded workloads typically show sublinear speedup on multi-core hardware, i.e., the achieved speedup is not proportional to the number of cores and threads. Sublinear scaling may have multiple causes, such as poorly scalable synchronization leading...
 
How sensitive is processor customization to the workload's input datasets?
Found in: Application Specific Processors, Symposium on
By Maximilien Breughe,Zheng Li,Yang Chen,Stijn Eyerman,Olivier Temam,Chengyong Wu,Lieven Eeckhout
Issue Date:June 2011
pp. 1-7
Hardware customization is an effective approach for meeting application performance requirements while achieving high levels of energy efficiency. Application-specific processors achieve high performance at low energy by tailoring their designs towards a s...
 
Mechanistic-empirical processor performance modeling for constructing CPI stacks on real hardware
Found in: Performance Analysis of Systems and Software, IEEE International Symmposium on
By Stijn Eyerman, Kenneth Hoste, Lieven Eeckhout
Issue Date:April 2011
pp. 216-226
Analytical processor performance modeling has received increased interest over the past few years. There are basically two approaches to constructing an analytical model: mechanistic modeling and empirical modeling. Mechanistic modeling builds up an analyt...
 
A Counter Architecture for Online DVFS Profitability Estimation
Found in: IEEE Transactions on Computers
By Stijn Eyerman, Lieven Eeckhout
Issue Date:November 2010
pp. 1576-1583
Dynamic voltage and frequency scaling (DVFS) is a well known and effective technique for reducing power consumption in modern microprocessors. An important concern though is to estimate its profitability in terms of performance and energy. Current DVFS pro...
 
Per-Thread Cycle Accounting
Found in: IEEE Micro
By Stijn Eyerman, Lieven Eeckhout
Issue Date:January 2010
pp. 71-80
<p>Resource sharing unpredictably affects per-thread performance in multithreaded architectures, but system software assumes all coexecuting threads make equal progress. Per-thread cycle accounting addresses this problem by tracking per-thread progre...
 
System-Level Performance Metrics for Multiprogram Workloads
Found in: IEEE Micro
By Stijn Eyerman, Lieven Eeckhout
Issue Date:May 2008
pp. 42-53
Assessing the performance of multiprogram workloads running on multithreaded hardware is difficult because it involves a balance between single-program performance and overall system performance. This article argues for developing multiprogram performance ...
 
A Top-Down Approach to Architecting CPI Component Performance Counters
Found in: IEEE Micro
By Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, James E. Smith
Issue Date:January 2007
pp. 84-93
Software developers can gain insight into software-hardware interactions by decomposing processor performance into individual cycles-per-instruction components that differentiate cycles consumed in active computation from those spent handling various miss ...
 
Restating the Case for Weighted-IPC Metrics to Evaluate Multiprogram Workload Performance
Found in: IEEE Computer Architecture Letters
By Stijn Eyerman,Lieven Eeckhout
Issue Date:May 2013
pp. 1
Weighted speedup is nowadays the most commonly used multiprogram workload performance metric. Weighted speedup is a weighted-IPC metric, i.e., the multiprogram IPC of each program is first weighted with its isolated IPC. Recently, Michaud questions the val...
 
The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism
Found in: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems (ASPLOS '14)
By Lieven Eeckhout, Stijn Eyerman
Issue Date:March 2014
pp. 591-606
The number of active threads in a multi-core processor varies over time and is often much smaller than the number of supported hardware threads. This requires multi-core chip designs to balance core count and per-core performance. Low active thread counts ...
     
Bottle graphs: visualizing scalability bottlenecks in multi-threaded applications
Found in: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications (OOPSLA '13)
By Lieven Eeckhout, Stijn Eyerman, Jennifer B. Sartor, Kristof Du Bois
Issue Date:October 2013
pp. 355-372
Understanding and analyzing multi-threaded program performance and scalability is far from trivial, which severely complicates parallel software development and optimization. In this paper, we present bottle graphs, a powerful analysis tool that visualizes...
     
Criticality stacks: identifying critical threads in parallel programs using synchronization behavior
Found in: Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13)
By Jennifer B. Sartor, Kristof Du Bois, Lieven Eeckhout, Stijn Eyerman
Issue Date:June 2013
pp. 511-522
Analyzing multi-threaded programs is quite challenging, but is necessary to obtain good multicore performance while saving energy. Due to synchronization, certain threads make others wait, because they hold a lock or have yet to reach a barrier. We call th...
     
Per-thread cycle accounting in multicore processors
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Kristof Du Bois, Lieven Eeckhout, Stijn Eyerman
Issue Date:January 2013
pp. 1-22
While multicore processors improve overall chip throughput and hardware utilization, resource sharing among the cores leads to unpredictable performance for the individual threads running on a multicore processor. Unpredictable per-thread performance becom...
     
Probabilistic modeling for job symbiosis scheduling on SMT processors
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Lieven Eeckhout, Stijn Eyerman
Issue Date:June 2012
pp. 1-27
Symbiotic job scheduling improves simultaneous multithreading (SMT) processor performance by coscheduling jobs that have “compatible” demands on the processor's shared resources. Existing approaches however require a sampling phase, evaluate a ...
     
A first-order mechanistic model for architectural vulnerability factor
Found in: Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA '12)
By Arun Arvind Nair, Lieven Eeckhout, Lizy Kurian John, Stijn Eyerman
Issue Date:June 2012
pp. 273-284
Soft error reliability has become a first-order design criterion for modern microprocessors. Architectural Vulnerability Factor (AVF) modeling is often used to capture the probability that a radiation-induced fault in a hardware structure will manifest as ...
     
Fine-grained DVFS using on-chip regulators
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Lieven Eeckhout, Stijn Eyerman
Issue Date:April 2011
pp. 1-24
Limit studies on Dynamic Voltage and Frequency Scaling (DVFS) provide apparently contradictory conclusions. On the one hand early limit studies report that DVFS is effective at large timescales (on the order of million(s) of cycles) with large scaling over...
     
Modeling critical sections in Amdahl's law and its implications for multicore design
Found in: Proceedings of the 37th annual international symposium on Computer architecture (ISCA '10)
By Lieven Eeckhout, Stijn Eyerman
Issue Date:June 2010
pp. 72-ff
This paper presents a fundamental law for parallel performance: it shows that parallel performance is not only limited by sequential code (as suggested by Amdahl's law) but is also fundamentally limited by synchronization through critical sections. Extendi...
     
Probabilistic job symbiosis modeling for SMT processor scheduling
Found in: Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems (ASPLOS '10)
By Lieven Eeckhout, Stijn Eyerman
Issue Date:March 2010
pp. 222-230
Symbiotic job scheduling boosts simultaneous multithreading (SMT) processor performance by co-scheduling jobs that have `compatible' demands on the processor's shared resources. Existing approaches however require a sampling phase, evaluate a limited numbe...
     
A mechanistic performance model for superscalar out-of-order processors
Found in: ACM Transactions on Computer Systems (TOCS)
By James E. Smith, Lieven Eeckhout, Stijn Eyerman, Tejas Karkhanis
Issue Date:May 2009
pp. 1-37
A mechanistic model for out-of-order superscalar processors is developed and then applied to the study of microarchitecture resource scaling. The model divides execution time into intervals separated by disruptive miss events such as branch mispredictions ...
     
Per-thread cycle accounting in SMT processors
Found in: Proceeding of the 14th international conference on Architectural support for programming languages and operating systems (ASPLOS '09)
By Lieven Eeckhout, Stijn Eyerman
Issue Date:March 2009
pp. 23-27
This paper proposes a cycle accounting architecture for Simultaneous Multithreading (SMT) processors that estimates the execution times for each of the threads had they been executed alone, while they are running simultaneously on the SMT processor. This i...
     
A performance counter architecture for computing accurate CPI components
Found in: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems (ASPLOS-XII)
By James E. Smith, Lieven Eeckhout, Stijn Eyerman, Tejas Karkhanis
Issue Date:October 2006
pp. 109-es
A common way of representing processor performance is to use Cycles per Instruction (CPI) `stacks' which break performance into a baseline CPI plus a number of individual miss event CPI components. CPI stacks can be very helpful in gaining insight into the...
     
 1