Search For:

Displaying 1-50 out of 76 total
Patching Processor Design Errors with Programmable Hardware
Found in: IEEE Micro
By Smruti Sarangi, Satish Narayanasamy, Bruce Carneal, Abhishek Tiwari, Brad Calder, Josep Torrellas
Issue Date:January 2007
pp. 12-25
Equipping processors with programmable hardware to patch design errors lets manufacturers release regular hardware patches, avoiding costly chip recalls and potentially speeding time to market. For each error detected, the manufacturer creates a fingerprin...
 
BugNet: Recording Application-Level Execution for Deterministic Replay Debugging
Found in: IEEE Micro
By Satish Narayanasamy, Gilles Pokam, Brad Calder
Issue Date:January 2006
pp. 100-109
With software's increasing complexity, providing efficient hardware support for software debugging is critical. Hardware support is necessary to observe and capture, with little or no overhead, the exact execution of a program. Providing this ability to de...
 
Discovering and Exploiting Program Phases
Found in: IEEE Micro
By Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder
Issue Date:November 2003
pp. 84-93
<p>In a single second, a modern processor can execute billions of instructions and a program's behavior can change many times. Some programs change behavior drastically, switching between periods of high and low performance, yet system design and opt...
 
Representative Multiprogram Workloads for Multithreaded Processor Simulation
Found in: IEEE Workload Characterization Symposium
By Michael Van Biesbrouck, Lieven Eeckhout, Brad Calder
Issue Date:September 2007
pp. 193-203
Almost all new consumer-grade processors are capable of executing multiple programs simultaneously. The analysis of multiprogrammed workloads for multicore and SMT processors is challenging and time-consuming because there are many possible combinations of...
 
A Loop Correlation Technique to Improve Performance Auditing
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Jeremy Lau, Matthew Arnold, Michael Hind, Brad Calder
Issue Date:September 2007
pp. 259-269
Performance auditing is an online optimization strategy that empirically measures the effectiveness of an optimization on a particular code region. It has the potential to greatly improve performance and prevent degradations due to compiler optimizations. ...
 
Accelerating and Adapting Precomputation Threads for Effcient Prefetching
Found in: High-Performance Computer Architecture, International Symposium on
By Weifeng Zhang, Dean M. Tullsen, Brad Calder
Issue Date:February 2007
pp. 85-95
Speculative precomputation enables effective cache prefetching for even irregular memory access behavior, by using an alternate thread on a multithreaded or multi-core architecture. This paper describes a system that constructs and runs precomputation base...
 
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Jack Sampson, Ruben Gonzalez, Jean-Francois Collard, Norman P. Jouppi, Mike Schlansker, Brad Calder
Issue Date:December 2006
pp. 235-246
<p>We examine the ability of CMPs, due to their lower onchip communication latencies, to exploit data parallelism at inner-loop granularities similar to that commonly targeted by vector machines. Parallelizing code in this manner leads to a high freq...
 
Efficient Sampling Startup for SimPoint
Found in: IEEE Micro
By Michael Van Biesbrouck, Brad Calder, Lieven Eeckhout
Issue Date:July 2006
pp. 32-42
Sampling techniques dramatically shorten simulation times for industry-standard benchmarks, but establishing the correct architecture and microarchitecture states at the beginning of each sample can be time-consuming. This article compares the accuracy and...
 
A Self-Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Weifeng Zhang, Brad Calder, Dean M. Tullsen
Issue Date:March 2006
pp. 50-64
<p>Software prefetching has been demonstrated as a powerful technique to tolerate long load latencies. However, to be effective, prefetching must target the most critical (frequently missing) loads, and prefetch them sufficiently far in advance. This...
 
Selecting Software Phase Markers with Code Structure Analysis
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Jeremy Lau, Erez Perelman, Brad Calder
Issue Date:March 2006
pp. 135-146
<p>Most programs are repetitive, where similar behavior can be seen at different execution times. Algorithms have been proposed that automatically group similar portions of a program?s execution into phases, where samples of execution in the same pha...
 
Dynamic phase analysis for cycle-close trace generation
Found in: Hardware/software codesign and system synthesis, International conference on
By Rajesh Gupta, Brad Calder, Jeremy Lau, Cristiano Pereira
Issue Date:September 2005
pp. 321-326
For embedded system development, several companies provide cross-platform development tools to aid in debugging, prototyping and optimization of programs. These are full system emulation systems that can emulate the final binary to be run on the real board...
 
Variational Path Profiling
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Erez Perelman, Trishul Chilimbi, Brad Calder
Issue Date:September 2005
pp. 7-16
<p>Current profiling techniques are good at identifying where time is being spent during program execution. These techniques are not as good at pinpointing exactly where in the execution there are de.nite opportunities a programmer can exploit with o...
 
An Event-Driven Multithreaded Dynamic Optimization Framework
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Weifeng Zhang, Brad Calder, Dean M. Tullsen
Issue Date:September 2005
pp. 87-98
<p>Dynamic optimization has the potential to adapt the program?s behavior at run-time to deliver performance improvements over static optimization. Dynamic optimization systems usually perform their optimization in series with the application?s execu...
 
BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging
Found in: Computer Architecture, International Symposium on
By Satish Narayanasamy, Gilles Pokam, Brad Calder
Issue Date:June 2005
pp. 284-295
<p>Significant time is spent by companies trying to reproduce and fix the bugs that occur for released code. To assist developers, we propose the BugNet architecture to continuously record information on production runs. The information collected bef...
 
A Dependency Chain Clustered Microarchitecture
Found in: Parallel and Distributed Processing Symposium, International
By Satish Narayanasamy, Hong Wang, Perry Wang, John Shen, Brad Calder
Issue Date:April 2005
pp. 21b
In this paper we explore a new clustering approach for reducing the complexity of wide issue in-order processors based on EPIC architectures. Complexity effectiveness is achieved by heavily clustering the pipeline from decode to commit stage without the ne...
 
Transition Phase Classification and Prediction
Found in: High-Performance Computer Architecture, International Symposium on
By Jeremy Lau, Stefan Schoenmackers, Brad Calder
Issue Date:February 2005
pp. 278-289
Most programs are repetitive, where similar behavior can be seen at different execution times. Proposed on-line systems automatically group these similar intervals of execution into phases, where the intervals in a phase have homogeneous behavior and simil...
 
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Eric Tune, Rakesh Kumar, Dean M. Tullsen, Brad Calder
Issue Date:December 2004
pp. 183-194
A simultaneous multithreading (SMT) processor can issue instructions from several threads every cycle, allowing it to effectively hide various instruction latencies; this effect increases with the number of simultaneous contexts supported. However, each ad...
 
Hardware and Binary Modification Support for Code Pointer Protection From Buffer Overflow
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Nathan Tuck, Brad Calder, George Varghese
Issue Date:December 2004
pp. 209-220
Buffer overflow vulnerabilities are currently the most prevalent security vulnerability; they are responsible for over half of the CERT advisories issued in the last three years. Since many attacks exploit buffer overflow vulnerabilities, techniques that p...
 
Creating Converged Trace Schedules Using String Matching
Found in: High-Performance Computer Architecture, International Symposium on
By Satish Narayanasamy, Yuanfang Hu, Suleyman Sair, Brad Calder
Issue Date:February 2004
pp. 210
This paper focuses on generating efficient software pipelined schedules for in-order machines, which we call Converged Trace Schedules. For a candidate loop, we form a string of trace block identifiers by hashing together addresses of aggressively schedule...
 
Picking Statistically Valid and Early Simulation Points
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Erez Perelman, Greg Hamerly, Brad Calder
Issue Date:October 2003
pp. 244
<p>Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to months to complete. To address this issue we have recently proposed using Simulation Poi...
 
Phase Tracking and Prediction
Found in: Computer Architecture, International Symposium on
By Timothy Sherwood, Suleyman Sair, Brad Calder
Issue Date:June 2003
pp. 336
In a single second a modern processor can execute billions of instructions. Obtaining a bird?s eye view of the behavior of a program at these speeds can be a difficult task when all that is available is cycle by cycle examination. In many programs, behavio...
 
A Pipelined Memory Architecture for High Throughput Network Processors
Found in: Computer Architecture, International Symposium on
By Timothy Sherwood, George Varghese, Brad Calder
Issue Date:June 2003
pp. 288
Designing ASICs for each new generation of backbone routers is a time intensive and fiscally draining process. In this paper we focus on the design of a programmable architecture for backbone routers, based on the manipulation of wide irregular memory word...
 
Phi-Predication for Light-Weight If-Conversion
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Weihaw Chuang, Brad Calder, Jeanne Ferrante
Issue Date:March 2003
pp. 179
<p>Predicated execution can eliminate hard to predict branches and help to enable instruction level parallelism. Many current predication variants exist where the result update is conditional based upon the outcome of the guarding predicate. However,...
 
Incorporating Predicate Information into Branch Predictors
Found in: High-Performance Computer Architecture, International Symposium on
By Beth Simon, Brad Calder, Jeanne Ferrante
Issue Date:February 2003
pp. 53
<p>Predicated Execution can be used to alleviate the costs associated with frequently mispredicted branches. This is accomplished by trading the cost of a mispredicted branch for execution of both paths following the conditional branch.</p> <...
 
Catching Accurate Profiles in Hardware
Found in: High-Performance Computer Architecture, International Symposium on
By Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese
Issue Date:February 2003
pp. 269
<p>Run-time optimization is one of the most important ways of getting performance out of modern processors. Techniques such as prefetching, trace caching, memory disambiguation etc., are all based upon the principle of observation followed by adaptat...
 
Pointer Cache Assisted Prefetching
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Jamison Collins, Suleyman Sair, Brad Calder, Dean M. Tullsen
Issue Date:November 2002
pp. 62
Data prefetching effectively reduces the negative effects of long load latencies on the performance of modern processors. Hardware prefetchers employ hardware structures to predict future memory addresses based on previous patterns. Thread-based prefetcher...
 
Quantifying Instruction Criticality
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Eric S. Tune, Dean M. Tullsen, Brad Calder
Issue Date:September 2002
pp. 104
Information about instruction criticality can be used to control the application of micro-architectural resources efficiently. To this end, several groups have proposed methods to predict critical instructions. This paper presents a framework that allows u...
 
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Timothy Sherwood, Erez Perelman, Brad Calder
Issue Date:September 2001
pp. 0003
Abstract: Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to months to complete. To overcome this problem researchers choose a very small portion of...
 
Reducing Delay with Dynamic Selection of Compression Formats
Found in: High-Performance Distributed Computing, International Symposium on
By Chandra Krintz, Brad Calder
Issue Date:August 2001
pp. 0266
Abstract: Internet computing is facilitated by the remote execution methodology in which programs transfer to a destination for execution. Since transfer time can substantially degrade performance of remotely executed (mobile) programs, file compression is...
 
Automated Design of Finite State Machine Predictors for Customized Processors
Found in: Computer Architecture, International Symposium on
By Timothy Sherwood, Brad Calder
Issue Date:July 2001
pp. 0086
Abstract: Customized processors use compiler analysis and design automation techniques to take a generalized architectural model and create a specific instance of it which is optimized to a given application or set of applications. These processors offer t...
 
Optimizations Enabled by a Decoupled Front-End Architecture
Found in: IEEE Transactions on Computers
By Glenn Reinman, Brad Calder, Todd Austin
Issue Date:April 2001
pp. 338-355
<p><b>Abstract</b>—In the pursuit of instruction-level parallelism, significant demands are placed on a processor's instruction delivery mechanism. Delivering the performance necessary to meet future processor execution targets requires t...
 
Dynamic Prediction of Critical Path Instructions
Found in: High-Performance Computer Architecture, International Symposium on
By Eric Tune, Dongning Liang, Dean M. Tullsen, Brad Calder
Issue Date:January 2001
pp. 0185
Abstract: Modern processors come close to executing as fast as true dependences allow. The particular dependences that constrain execution speed constitute the critical path of execution. To optimize the performance of the processor, we either have to redu...
 
Fetch Directed Instruction Prefetching
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Glenn Reinman, Brad Calder, Todd Austin
Issue Date:November 1999
pp. 16
Instruction supply is a crucial component of processor performance. Instruction prefetching has been proposed as a mechanism to help reduce instruction cache misses, which in turn can help increase instruction supply to the processor. In this paper we exam...
 
Predicated Static Single Assignment
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Lori Carter, Beth Simon, Brad Calder, Larry Carter, Jeanne Ferrante
Issue Date:October 1999
pp. 245
Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches an...
 
A Scalable Front-End Architecture for Fast Instruction Delivery
Found in: Computer Architecture, International Symposium on
By Glenn Reinman, Brad Calder, Todd Austin
Issue Date:May 1999
pp. 0234
In the pursuit of instruction-level parallelism, significant demands are placed on a processor's instruction delivery mechanism. Delivering the performance necessary to meet future processor execution targets requires that the performance of the instructio...
 
Selective Value Prediction
Found in: Computer Architecture, International Symposium on
By Brad Calder, Glenn Reinman, Dean M. Tullsen
Issue Date:May 1999
pp. 0064
Value Prediction is a relatively new technique to increase instruction-level parallelism by breaking true data dependence chains. A value prediction architecture produces values, which may be later consumed by instructions that execute speculatively using ...
 
Instruction Recycling on a Multiple-Path Processor
Found in: High-Performance Computer Architecture, International Symposium on
By Steven Wallace, Dean M. Tullsen, Brad Calder
Issue Date:January 1999
pp. 44
Processors that can simultaneously execute multiple paths of execution will only exacerbate the fetch bandwidth problem already plaguing conventional processors. On a multiple-path processor, which speculatively executes less likely paths of hard-to-predic...
 
Value Profiling
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Brad Calder, Peter Feller, Alan Eustace
Issue Date:December 1997
pp. 259
Identifying variables as invariant or constant at compile-time allows the compiler to perform optimizations including constant folding, code specialization, and partial evaluation. Some variables, which cannot be labeled as constants, may exhibit semi-inva...
 
Procedure Placement Using Temporal Ordering Information
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Nikolas Gloy, Trevor Blackwell, Michael D. Smith, Brad Calder
Issue Date:December 1997
pp. 303
Instruction cache performance is very important to instruction fetch efficiency and overall processor performance. The layout of an executable has a substantial effect on the cache miss rate during execution. This means that the performance of an executabl...
 
Predictor-Directed Stream Buffers
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Timothy Sherwood, Suleyman Sair, Brad Calder
Issue Date:December 2000
pp. 42
<p>An effective method for reducing the effect of load latency in modem processors is data prefetching. One form of data prefetching, stream buffers, has been shown to be particularly effective due to its' ability to detect data streams and run ahead...
 
Next Cache Line and Set Prediction
Found in: Computer Architecture, International Symposium on
By Dirk Grunwald, Brad Calder
Issue Date:June 1995
pp. 287
Accurate instruction fetch and branch prediction is increasingly important on today's wide-issue architectures. Fetch prediction is the process of determining the next instruction to request from the memory subsystem. Branch prediction is the process of pr...
 
Instruction Cache Fetch Policies for Speculative Execution
Found in: Computer Architecture, International Symposium on
By Jean-Loup Baer, Brad Calder, Dirk Grunwald, Dennis Lee
Issue Date:June 1995
pp. 357
Current trends in processor design are pointing to deeper and wider pipelines and superscalar architectures. The efficient use of these resources requires speculative execution, a technique whereby the processor continues executing the predicted path of a ...
 
Inside windows azure: the challenges and opportunities of a cloud operating system
Found in: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems (ASPLOS '14)
By Brad Calder
Issue Date:March 2014
pp. 1-2
Cloud operating systems provide on-demand, scalable compute and storage resources. They allow service developers to focus on their business logic by simplifying many portions of their service, including resource management, provisioning, monitoring, and ap...
     
Editorial
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Brad Calder, Dean Tullsen, Dean Tullsen
Issue Date:May 2008
pp. 1-1
For many millions of users, 3D virtual worlds provide an engaging, immersive experience heightened by a synergistic combination of visual realism with dynamic control of the user’s movement within the virtual world. For individuals with visual or dex...
     
Phase-based cache reconfiguration for a highly-configurable two-level cache hierarchy
Found in: Proceedings of the 18th ACM Great Lakes symposium on VLSI (GLSVLSI '08)
By Ann Gordon-Ross, Brad Calder, Jeremy Lau
Issue Date:May 2008
pp. 1-37
Phase-based tuning methodologies specialize system parameters for each application phase of execution. Parameters are varied during execution, as opposed to remaining fixed as in an application-based tuning methodology. Prior work and logic suggests phase-...
     
Automatically classifying benign and harmful data racesallusing replay analysis
Found in: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation (PLDI '07)
By Andrew Edwards, Brad Calder, Jordan Tigani, Satish Narayanasamy, Zhenghao Wang
Issue Date:June 2007
pp. 22-31
Many concurrency bugs in multi-threaded programs are due to dataraces. There have been many efforts to develop static and dynamic mechanisms to automatically find the data races. Most of the prior work has focused on finding the data races and eliminating ...
     
Transient fault prediction based on anomalies in processor events
Found in: Proceedings of the conference on Design, automation and test in Europe (DATE '07)
By Ayse K. Coskun, Brad Calder, Satish Narayanasamy
Issue Date:April 2007
pp. 1140-1145
Future microprocessors will be highly susceptible to transient errors as the sizes of transistors decrease due to CMOS scaling. Prior techniques advocated full scale structural or temporal redundancy to achieve fault tolerance. Though they can provide comp...
     
Introduction
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Brad Calder, Dean Tullsen
Issue Date:March 2007
pp. 1-es
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or from the innermost loop to outer loops. This paper proposes a three-step approach, called single-dimension software pipelining (SSP), to software pipeline a ...
     
Unbounded page-based transactional memory
Found in: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems (ASPLOS-XII)
By Brad Calder, Ganesh Venkatesh, Gilles Pokam, Jack Sampson, Michael Van Biesbrouck, Osvaldo Colavin, Satish Narayanasamy, Weihaw Chuang
Issue Date:October 2006
pp. 109-es
Exploiting thread level parallelism is paramount in the multicore era. Transactions enable programmers to expose such parallelism by greatly simplifying the multi-threaded programming model. Virtualized transactions (unbounded in space and time) are desira...
     
Recording shared memory dependencies using strata
Found in: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems (ASPLOS-XII)
By Brad Calder, Cristiano Pereira, Satish Narayanasamy
Issue Date:October 2006
pp. 109-es
Significant time is spent by companies trying to reproduce and fix bugs. BugNet and FDR are recent architecture proposals that provide architecture support for deterministic replay debugging. They focus on continuously recording information about the progr...
     
 1  2 Next >>