Search For:

Displaying 1-50 out of 96 total
Single-Threaded vs. Multithreaded: Where Should We Focus?
Found in: IEEE Micro
By Joel Emer, Mark D. Hill, Yale N. Patt, Joshua J. Yi, Derek Chiou, Resit Sendag
Issue Date:November 2007
pp. 14-24
To continue to offer improvements in application performance, should computer architecture researchers and chip manufacturers focus on improving single-threaded or multithreaded performance? This panel, from the 2007 Workshop on Computer Architecture Resea...
 
Profile-assisted Compiler Support for Dynamic Predication in Diverge-Merge Processors
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Hyesoon Kim, Jos´e A. Joao, Onur Mutlu, Yale N. Patt
Issue Date:March 2007
pp. 367-378
<p>Dynamic predication has been proposed to reduce the branch misprediction penalty due to hard-to-predict branch instructions. A recently proposed dynamic predication architecture, the diverge-merge processor (DMP), provides large performance improv...
 
Select-Free Instruction Scheduling Logic
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Mary D. Brown, Jared Stark, Yale N. Patt
Issue Date:December 2001
pp. 204
Pipelining allows processors to exploit parallelism. Unfortunately, critical loops-pieces of logic that must evaluate in a single cycle to meet IPC (Instructions Per Cycle) goals-prevent deeper pipelining. In today's processors, one of these loops is the i...
 
Identifying Obstacles in the Path to More
Found in: Computer
By Yale N. Patt
Issue Date:December 1997
pp. 32
<p>Our industry continually wants more-more performance, more functionality, lower power requirements, and cheaper cost. In many aspects of computing, however, providing more is not without obstacles. In the six articles in this issue, researchers at...
 
Guest Editor's Introduction Real Machines: Design Choices/Engineering Trade-Offs
Found in: Computer
By Yale N. Patt
Issue Date:January 1989
pp. 8-10
No summary available.
   
Top Picks
Found in: IEEE Micro
By Yale N. Patt, Onur Mutlu
Issue Date:January 2011
pp. 6-10
<p>This special issue is the eighth in an important tradition in the computer architecture community: <it>IEEE Micro</it>'s Top Picks from the Computer Architecture Conferences. This tradition provides a means for sharing a sample of the ...
 
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Moinuddin K. Qureshi, Yale N. Patt
Issue Date:December 2006
pp. 423-432
<p>This paper investigates the problem of partitioning a shared cache between multiple concurrently executing applications. The commonly used LRU policy implicitly partitions a shared cache on a demand basis, giving more cache resources to the applic...
 
Diverge-Merge Processor: Generalized and Energy-Efficient Dynamic Predication
Found in: IEEE Micro
By Hyesoon Kim, José A. Joao, Onur Mutlu, Yale N. Patt
Issue Date:January 2007
pp. 94-104
The branch misprediction penalty is a major performance limiter and a major cause of wasted energy in high-performance processors. The diverge-merge processor reduces this penalty by dynamically predicating a wide range of hard-to-predict branches at runti...
 
A Case for MLP-Aware Cache Replacement
Found in: Computer Architecture, International Symposium on
By Moinuddin K. Qureshi, Daniel N. Lynch, Onur Mutlu, Yale N. Patt
Issue Date:June 2006
pp. 167-178
<p>Performance loss due to long-latency memory accesses can be reduced by servicing multiple memory accesses concurrently. The notion of generating and servicing long-latency cache misses in parallel is called Memory Level Parallelism (MLP). MLP is n...
 
An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors
Found in: IEEE Transactions on Computers
By Onur Mutlu, Hyesoon Kim, David N. Armstrong, Yale N. Patt
Issue Date:December 2005
pp. 1556-1571
High-performance, out-of-order execution processors spend a significant portion of their execution time on the incorrect program path even though they employ aggressive branch prediction algorithms. Although memory references generated on the wrong path do...
 
Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery
Found in: Microarchitecture, IEEE/ACM International Symposium on
By David N. Armstrong, Hyesoon Kim, Onur Mutlu, Yale N. Patt
Issue Date:December 2004
pp. 119-128
Control and data speculation are widely used to improve processor performance. Correct speculation can reduce execution time, but incorrect speculation can lead to increased execution time and greater energy consumption.<div></div> This paper p...
 
Cache Filtering Techniques to Reduce the Negative Impact of Useless Speculative Memory References on Processor Performance
Found in: Computer Architecture and High Performance Computing, Symposium on
By Onur Mutlu, Hyesoon Kim, David N. Armstrong, Yale N. Patt
Issue Date:October 2004
pp. 2-9
High-performance processors employ aggressive speculation and prefetching techniques to increase performance. Speculative memory references caused by these techniques sometimes bring data into the caches that are not needed by correct execution. This paper...
 
Virtual Program Counter (VPC) Prediction: Very Low Cost Indirect Branch Prediction Using Conditional Branch Prediction Hardware
Found in: IEEE Transactions on Computers
By Hyesoon Kim, José A. Joao, Onur Mutlu, Chang Joo Lee, Yale N. Patt, Robert Cohn
Issue Date:September 2009
pp. 1153-1170
Indirect branches have become increasingly common in modular programs written in modern object-oriented languages and virtual-machine-based runtime systems. Unfortunately, the prediction accuracy of indirect branches has not improved as much as that of con...
 
Techniques for Efficient Processing in Runahead Execution Engines
Found in: Computer Architecture, International Symposium on
By Onur Mutlu, Hyesoon Kim, Yale N. Patt
Issue Date:June 2005
pp. 370-381
<p>Runahead execution is a technique that improves processor performance by pre-executing the running application instead of stalling the processor when a long-latency cache miss occurs. Previous research has shown that this technique significantly i...
 
Using System-Level Models to Evaluate I/O Subsystem Designs
Found in: IEEE Transactions on Computers
By Gregory R. Ganger, Yale N. Patt
Issue Date:June 1998
pp. 667-678
<p><b>Abstract</b>—We describe a <b>system-level simulation model</b> and show that it enables accurate predictions of both I/O subsystem and overall system performance. In contrast, the conventional approach for evaluating th...
 
Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Eric Hao, Po-Yung Chang, Marius Evers, Yale N. Patt
Issue Date:December 1996
pp. 191
To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers of functional units. Instruction fetch rate must also be increased in order to effectively exploit the performance potential o...
 
The I/O subsystem/spl minus/a candidate for improvement
Found in: Computer
By Yale N. Patt
Issue Date:March 1994
pp. 15-16
<p>A computer system can be partitioned into hardware and the software executing on that hardware. The hardware consists of processor(s), memory, and
 
Predicting Performance Impact of DVFS for Realistic Memory Systems
Found in: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
By Rustam Miftakhutdinov,Eiman Ebrahimi,Yale N. Patt
Issue Date:December 2012
pp. 155-165
Dynamic voltage and frequency scaling (DVFS) can make modern processors more power and energy efficient if we can accurately predict the effect of frequency scaling on processor performance. State-of-the-art DVFS performance predictors, however, fail to ac...
 
MorphCore: An Energy-Efficient Microarchitecture for High Performance ILP and High Throughput TLP
Found in: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
By Khubaib,M. Aater Suleman,Milad Hashemi,Chris Wilkerson,Yale N. Patt
Issue Date:December 2012
pp. 305-316
Several researchers have recognized in recent years that today's workloads require a micro architecture that can handle single-threaded code at high performance, and multi-threaded code at high throughput, while consuming no more energy than is necessary. ...
 
Energy Savings via Dead Sub-Block Prediction
Found in: 2012 24th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
By Marco A.Z. Alves, Khubaib,Eiman Ebrahimi,Veynu T. Narasiman,Carlos Villavieja,Philippe O.A. Navaux,Yale N. Patt
Issue Date:October 2012
pp. 51-58
Cache memories have traditionally been designed to exploit spatial locality by fetching entire cache lines from memory upon a miss. However, recent studies have shown that often the number of sub-blocks within a line that are actually used is low. Furtherm...
 
Data Marshaling for Multicore Systems
Found in: IEEE Micro
By M. Aater Suleman, Onur Mutlu, Jose A. Joao, Khubaib Khubaib, Yale N. Patt
Issue Date:January 2011
pp. 56-64
<p>Dividing a program into segments and executing each segment at the core best suited to run it can improve performance and save power. When consecutive segments run on different cores, accesses to intersegment data incur cache misses. Data Marshali...
 
Prefetch-Aware Memory Controllers
Found in: IEEE Transactions on Computers
By Chang Joo Lee,Onur Mutlu,Veynu Narasiman,Yale N. Patt
Issue Date:October 2011
pp. 1406-1430
Existing DRAM controllers employ rigid, nonadaptive scheduling and buffer management policies when servicing prefetch requests. Some controllers treat prefetches the same as demand requests, and others always prioritize demands over prefetches. However, no...
 
Accelerating Critical Section Execution with Asymmetric Multicore Architectures
Found in: IEEE Micro
By M. Aater Suleman, Onur Mutlu, Moinuddin K. Qureshi, Yale N. Patt
Issue Date:January 2010
pp. 60-70
<p>Contention for critical sections can reduce performance and scalability by causing thread serialization. The proposed accelerated critical sections mechanism reduces this limitation. ACS executes critical sections on the high-performance core of a...
 
Prefetch-Aware DRAM Controllers
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Chang Joo Lee, Onur Mutlu, Veynu Narasiman, Yale N. Patt
Issue Date:November 2008
pp. 200-209
Existing DRAM controllers employ rigid, non-adaptive scheduling and buffer management policies when servicing prefetch requests. Some controllers treat prefetch requests the same as demand requests, others always prioritize demand requests over prefetch re...
 
Achieving Out-of-Order Performance with Almost In-Order Complexity
Found in: Computer Architecture, International Symposium on
By Francis Tseng, Yale N. Patt
Issue Date:June 2008
pp. 3-12
There is still much performance to be gained by out-of-order processors with wider issue widths. However, traditional methods of increasing issue width do not scale; that is, they drastically increase design complexity and power requirements. This paper in...
 
Set-Dueling-Controlled Adaptive Insertion for High-Performance Caching
Found in: IEEE Micro
By Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely Jr., Joel Emer
Issue Date:January 2008
pp. 91-98
The commonly used LRU replacement policy causes thrashing for memory-intensive workloads. A simple mechanism that dynamically changes the insertion policy used by LRU replacement reduces cache misses by 21 percent and requires a total storage overhead of l...
 
Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers
Found in: High-Performance Computer Architecture, International Symposium on
By Santhosh Srinath, Onur Mutlu, Hyesoon Kim, Yale N. Patt
Issue Date:February 2007
pp. 63-74
High performance processors employ hardware data prefetching to reduce the negative performance impact of large main memory latencies. While prefetching improves performance substantially on many programs, it can significantly reduce performance on others....
 
Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines
Found in: High-Performance Computer Architecture, International Symposium on
By Moinuddin K. Qureshi, M. Aater Suleman, Yale N. Patt
Issue Date:February 2007
pp. 250-259
Caches are organized at a line-size granularity to exploit spatial locality. However, when spatial locality is low, many words in the cache line are not used. Unused words occupy cache space but do not contribute to cache hits. Filtering these words can al...
 
Address-Value Delta (AVD) Prediction: A Hardware Technique for Efficiently Parallelizing Dependent Cache Misses
Found in: IEEE Transactions on Computers
By Onur Mutlu, Hyesoon Kim, Yale N. Patt
Issue Date:December 2006
pp. 1491-1508
While runahead execution is effective at parallelizing independent long-latency cache misses, it is unable to parallelize dependent long-latency cache misses. To overcome this limitation, this paper proposes a novel hardware technique, address-value delta ...
 
2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Hyesoon Kim, M. Aater Suleman, Onur Mutlu, Yale N. Patt
Issue Date:March 2006
pp. 159-172
Static compilers use profiling to predict run-time program behavior. Generally, this requires multiple input sets to capture wide variations in run-time behavior. This is expensive in terms of resources and compilation time. We introduce a new mechanism, 2...
 
Wish Branches: Enabling Adaptive and Aggressive Predicated Execution
Found in: IEEE Micro
By Hyesoon Kim, Onur Mutlu, Yale N. Patt, Jared Stark
Issue Date:January 2006
pp. 48-58
The goal of wish branches is to use predicated execution for hard-to-predict dynamic branches, and branch prediction for easy-to-predict dynamic branches, thereby obtaining the best of both worlds. Wish loops, one class of wish branches, use predication to...
 
Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance
Found in: IEEE Micro
By Onur Mutlu, Hyesoon Kim, Yale N. Patt
Issue Date:January 2006
pp. 10-20
Several simple techniques can make runahead execution more efficient by reducing the number of instructions executed and thereby reducing the additional energy consumption typically associated with runahead execution.
 
Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Onur Mutlu, Hyesoon Kim, Yale N. Patt
Issue Date:November 2005
pp. 233-244
<p>While runahead execution is effective at parallelizing independent long-latency cache misses, it is unable to parallelize dependent long-latency cache misses. To overcome this limitation, this paper proposes a novel technique, address-value delta ...
 
The V-Way Cache: Demand Based Associativity via Global Replacement
Found in: Computer Architecture, International Symposium on
By Moinuddin K. Qureshi, David Thompson, Yale N. Patt
Issue Date:June 2005
pp. 544-555
As processor speeds increase and memory latency becomes more critical, intelligent design and management of secondary caches becomes increasingly important. The efficiency of current set-associative caches is reduced because programs exhibit a non-uniform ...
 
Microarchitecture-Based Introspection: A Technique for Transient-Fault Tolerance in Microprocessors
Found in: Dependable Systems and Networks, International Conference on
By Moinuddin K. Qureshi, Onur Mutlu, Yale N. Patt
Issue Date:July 2005
pp. 434-443
<p>The increasing transient fault rate will necessitate on-chip fault tolerance techniques in future processors. The speed gap between the processor and the memory is also increasing, causing the processor to stay idle for hundreds of cycles while wa...
 
On Reusing the Results of Pre-Executed Instructions in a Runahead Execution Processor
Found in: IEEE Computer Architecture Letters
By Onur Mutlu, Hyesoon Kim, Jared Stark, Yale N. Patt
Issue Date:January 2005
pp. N/A
Previous research on runahead execution took it for granted as a prefetch-only technique. Even though the results of instructions independent of an L2 miss are correctly computed during runahead mode, previous approaches discarded those results instead of ...
 
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
Found in: High-Performance Computer Architecture, International Symposium on
By Onur Mutlu, Jared Stark, Chris Wilkerson, Yale N. Patt
Issue Date:February 2003
pp. 129
Today?s high performance processors tolerate long latency operations by means of out-of-order execution. However, as latencies increase, the size of the instruction window must increase even faster if we are to continue to tolerate these latencies. We have...
 
Microarchitectural Support for Precomputation Microthreads
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Robert S. Chappell, Francis Tseng, Adi Yoaz, Yale N. Patt
Issue Date:November 2002
pp. 74
Research has shown that precomputation microthreads can be useful for improving branch prediction and prefetching. However, it is not obvious how to provide the necessary microarchitectural support, and few details have been given in the literature. By jud...
 
Difficult-Path Branch Prediction Using Subordinate Microthreads
Found in: Computer Architecture, International Symposium on
By Robert S. Chappell, Francis Tseng, Yale N. Patt, Adi Yoaz
Issue Date:May 2002
pp. 0307
Branch misprediction penalties continue to increase as microprocessor cores become wider and deeper. Thus, improving branch prediction accuracy remains an important challenge. Simultaneous Subordinate Microthreading (SSMT) provides a means to improve branc...
 
Simultaneous Subordinate Microthreading (SSMT)
Found in: Computer Architecture, International Symposium on
By Robert S. Chappell, Jared Stark, Steven K. Reinhardt, Yale N. Patt, Sangwook P. Kim
Issue Date:May 1999
pp. 0186
Current work in Simultaneous Multithreading provides little benefit to programs that aren't partitioned into threads. We propose Simultaneous Subordinate Microthreading (SSMT) to correct this by spawning subordinate threads that perform optimizations on be...
 
Evaluation of Design Options for the Trace Cache Fetch Mechanism
Found in: IEEE Transactions on Computers
By Sanjay Jeram Patel, Daniel Holmes Friendly, Yale N. Patt
Issue Date:February 1999
pp. 193-204
<p><b>Abstract</b>—In this paper, we examine some critical design features of a trace cache fetch engine for a 16-wide issue processor and evaluate their effects on performance. We evaluate path associativity, partial matching, and inacti...
 
Alternative Fetch and Issue Policies for the Trace Cache Fetch Mechanism
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Daniel H. Friendly, Sanjay J. Patel, Yale N. Patt
Issue Date:December 1997
pp. 24
The increasing widths of superscalar processors are placing greater demands upon the fetch mechanism. The trace cache meets these demands by placing logically contiguous instructions in physically contiguous storage. It is capable of supplying multiple fet...
 
Reducing the Performance Impact of Instruction Cache Misses by Writing Instructions into the Reservation Stations Out-of-Order
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Jared Stark, Paul Racunas, Yale N. Patt
Issue Date:December 1997
pp. 34
In conventional processors, each instruction cache fetch brings in a group of instructions. Upon encountering an instruction cache miss, the processor will wait until the instruction cache miss is serviced before continuing to fetch any new instructions. T...
 
One Billion Transistors, One Uniprocessor, One Chip
Found in: Computer
By Yale N. Patt, Sanjay J. Patel, Marius Evers, Daniel H. Friendly, Jared Stark
Issue Date:September 1997
pp. 51-57
<p>Researchers from the University of Michigan conclude that billion-transistor processors will be much as they are today, but just bigger, faster, and wider (issuing more instructions at once). The authors describe the key problems (instruction supp...
 
Improving Branch Prediction Accuracy by Reducing Pattern History Table Interference
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Po-Yung Chang, Marius Evers, Yale N. Patt
Issue Date:October 1996
pp. 0048
Today's deeply pipelined, superscalar processors rely on accurate branch prediction in order to approach their performance potential. Branch mispredictions result in a flushing of the speculative information in the pipeline, thus limiting the amount of use...
 
The Effects of Mispredicted-Path Execution on Branch Prediction Structures
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Stephan Jourdan, Tse-Hao Hsing, Jared Stark, Yale N. Patt
Issue Date:October 1996
pp. 0058
Branch prediction accuracies determined using trace-driven simulation do not include the effects of executing branches along a mispredicted path. However, branches along a mispredicted path will pollute the branch prediction structures if no recovery mecha...
 
The Microprocessor for Scientific Computing in the Year 2000
Found in: Computing in Science and Engineering
By Yale N. Patt
Issue Date:June 1996
pp. 42-43
No summary available.
 
Disk arrays: high-performance, high-reliability storage subsystems
Found in: Computer
By Gregory R. Ganger, Bruce L. Worthington, Robert Y. Hou, Yale N. Patt
Issue Date:March 1994
pp. 30-36
<p>As the performance of other system components continues to improve rapidly, storage subsystem performance becomes increasingly important. Storage subsystem performance and reliability can be enhanced by logically grouping multiple disk drives into...
 
Guest Editor's Introduction: Experimental Research in Computer Architecture
Found in: Computer
By Yale N. Patt
Issue Date:January 1991
pp. 14-16
No summary available.
   
On Pipelining Dynamic Instruction Scheduling Logic
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Jared Stark, Mary D. Brown, Yale N. Patt
Issue Date:December 2000
pp. 57
<p>A machine's performance is the product of its IPC (Instructions Per Cycle) and clock frequency. Recently, Palacharla, Jouppi, and Smith [3] warned that the dynamic instruction scheduling logic for current machines performs an atomic operation. Eit...
 
 1  2 Next >>