Search For:

Displaying 1-50 out of 88 total
Phase change memory architecture and the quest for scalability
Found in: Communications of the ACM
By Benjamin C. Lee, Doug Burger, Doug Burger, Doug Burger, Engin Ipek, Engin Ipek, Engin Ipek, Onur Mutlu, Onur Mutlu, Onur Mutlu
Issue Date:July 2010
pp. 99-106
Memory scaling is in jeopardy as charge storage and sensing mechanisms become less reliable for prevalent memory technologies, such as dynamic random access memory (DRAM). In contrast, phase change memory (PCM) relies on programmable resistances, as well a...
     
Top Picks
Found in: IEEE Micro
By Yale N. Patt, Onur Mutlu
Issue Date:January 2011
pp. 6-10
<p>This special issue is the eighth in an important tradition in the computer architecture community: <it>IEEE Micro</it>'s Top Picks from the Computer Architecture Conferences. This tradition provides a means for sharing a sample of the ...
 
Parallelism-Aware Batch Scheduling: Enabling High-Performance and Fair Shared Memory Controllers
Found in: IEEE Micro
By Onur Mutlu, Thomas Moscibroda
Issue Date:January 2009
pp. 22-32
<p>Uncontrolled interthread interference in main memory can destroy individual threads' memory-level parallelism, effectively serializing the memory requests of a thread whose latencies would otherwise have largely overlapped, thereby reducing single...
 
Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers
Found in: High-Performance Computer Architecture, International Symposium on
By Santhosh Srinath, Onur Mutlu, Hyesoon Kim, Yale N. Patt
Issue Date:February 2007
pp. 63-74
High performance processors employ hardware data prefetching to reduce the negative performance impact of large main memory latencies. While prefetching improves performance substantially on many programs, it can significantly reduce performance on others....
 
HAT: Heterogeneous Adaptive Throttling for On-Chip Networks
Found in: 2012 24th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
By Kevin Kai-Wei Chang,Rachata Ausavarungnirun,Chris Fallin,Onur Mutlu
Issue Date:October 2012
pp. 9-18
The network-on-chip (NoC) is a primary shared resource in a chip multiprocessor (CMP) system. As core counts continue to increase and applications become increasingly data-intensive, the network load will also increase, leading to more congestion in the ne...
 
A case for small row buffers in non-volatile main memories
Found in: 2012 IEEE 30th International Conference on Computer Design (ICCD 2012)
By Justin Meza,Jing Li,Onur Mutlu
Issue Date:September 2012
pp. 484-485
DRAM-based main memories have read operations that destroy the read data, and as a result, must buffer large amounts of data on each array access to keep chip costs low. Unfortunately, system-level trends such as increased memory contention in multi-core a...
 
Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management
Found in: IEEE Computer Architecture Letters
By Justin Meza,Jichuan Chang,HanBin Yoon,Onur Mutlu,Parthasarathy Ranganathan
Issue Date:July 2012
pp. 61-64
Hybrid main memories composed of DRAM as a cache to scalable non-volatile memories such as phase-change memory (PCM) can provide much larger storage capacity than traditional main memories. A key challenge for enabling high-performance and scalable hybrid ...
 
MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect
Found in: Networks-on-Chip, International Symposium on
By Chris Fallin,Greg Nazario,Xiangyao Yu,Kevin Chang,Rachata Ausavarungnirun,Onur Mutlu
Issue Date:May 2012
pp. 1-10
A conventional Network-on-Chip (NoC) router uses input buffers to store in-flight packets. These buffers improve performance, but consume significant power. It is possible to bypass these buffers when they are empty, reducing dynamic power, but static buff...
 
CHIPPER: A low-complexity bufferless deflection router
Found in: High-Performance Computer Architecture, International Symposium on
By Chris Fallin, Chris Craik, Onur Mutlu
Issue Date:February 2011
pp. 144-155
As Chip Multiprocessors (CMPs) scale to tens or hundreds of nodes, the interconnect becomes a significant factor in cost, energy consumption and performance. Recent work has explored many design tradeoffs for networks-on-chip (NoCs) with novel router archi...
 
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Yoongu Kim, Michael Papamichael, Onur Mutlu, Mor Harchol-Balter
Issue Date:December 2010
pp. 65-76
In a modern chip-multiprocessor system, memory is a shared resource among multiple concurrently executing threads. The memory scheduling algorithm should resolve memory contention by arbitrating memory access in such a way that competing threads progress a...
 
Online design bug detection: RTL analysis, flexible mechanisms, and evaluation
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Kypros Constantinides, Onur Mutlu, Todd Austin
Issue Date:November 2008
pp. 282-293
Higher level of resource integration and the addition of new features in modern multi-processors put a significant pressure on their verification. Although a large amount of resources and time are devoted to the verification phase of modern processors, man...
 
Prefetch-Aware DRAM Controllers
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Chang Joo Lee, Onur Mutlu, Veynu Narasiman, Yale N. Patt
Issue Date:November 2008
pp. 200-209
Existing DRAM controllers employ rigid, non-adaptive scheduling and buffer management policies when servicing prefetch requests. Some controllers treat prefetch requests the same as demand requests, others always prioritize demand requests over prefetch re...
 
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
Found in: Computer Architecture, International Symposium on
By Onur Mutlu, Thomas Moscibroda
Issue Date:June 2008
pp. 63-74
In a chip-multiprocessor (CMP) system, the DRAM system isshared among cores. In a shared DRAM system, requests from athread can not only delay requests from other threads by causingbank/bus/row-buffer conflicts but they can also destroy other threads’DRAM-...
 
Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Kypros Constantinides, Onur Mutlu, Todd Austin, Valeria Bertacco
Issue Date:December 2007
pp. 97-108
As silicon process technology scales deeper into the nanometer regime, hardware defects are becoming more common. Such de- fects are bound to hinder the correct operation of future processor systems, unless new online techniques become available to detect ...
 
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Onur Mutlu, Thomas Moscibroda
Issue Date:December 2007
pp. 146-160
DRAM memory is a major resource shared among cores in a chip multiprocessor (CMP) system. Memory requests from different threads can interfere with each other. Existing memory access scheduling techniques try to optimize the overall data throughput obtaine...
 
Profile-assisted Compiler Support for Dynamic Predication in Diverge-Merge Processors
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Hyesoon Kim, Jos´e A. Joao, Onur Mutlu, Yale N. Patt
Issue Date:March 2007
pp. 367-378
<p>Dynamic predication has been proposed to reduce the branch misprediction penalty due to hard-to-predict branch instructions. A recently proposed dynamic predication architecture, the diverge-merge processor (DMP), provides large performance improv...
 
A Case for MLP-Aware Cache Replacement
Found in: Computer Architecture, International Symposium on
By Moinuddin K. Qureshi, Daniel N. Lynch, Onur Mutlu, Yale N. Patt
Issue Date:June 2006
pp. 167-178
<p>Performance loss due to long-latency memory accesses can be reduced by servicing multiple memory accesses concurrently. The notion of generating and servicing long-latency cache misses in parallel is called Memory Level Parallelism (MLP). MLP is n...
 
2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Hyesoon Kim, M. Aater Suleman, Onur Mutlu, Yale N. Patt
Issue Date:March 2006
pp. 159-172
Static compilers use profiling to predict run-time program behavior. Generally, this requires multiple input sets to capture wide variations in run-time behavior. This is expensive in terms of resources and compilation time. We introduce a new mechanism, 2...
 
Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance
Found in: IEEE Micro
By Onur Mutlu, Hyesoon Kim, Yale N. Patt
Issue Date:January 2006
pp. 10-20
Several simple techniques can make runahead execution more efficient by reducing the number of instructions executed and thereby reducing the additional energy consumption typically associated with runahead execution.
 
Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Onur Mutlu, Hyesoon Kim, Yale N. Patt
Issue Date:November 2005
pp. 233-244
<p>While runahead execution is effective at parallelizing independent long-latency cache misses, it is unable to parallelize dependent long-latency cache misses. To overcome this limitation, this paper proposes a novel technique, address-value delta ...
 
Techniques for Efficient Processing in Runahead Execution Engines
Found in: Computer Architecture, International Symposium on
By Onur Mutlu, Hyesoon Kim, Yale N. Patt
Issue Date:June 2005
pp. 370-381
<p>Runahead execution is a technique that improves processor performance by pre-executing the running application instead of stalling the processor when a long-latency cache miss occurs. Previous research has shown that this technique significantly i...
 
Microarchitecture-Based Introspection: A Technique for Transient-Fault Tolerance in Microprocessors
Found in: Dependable Systems and Networks, International Conference on
By Moinuddin K. Qureshi, Onur Mutlu, Yale N. Patt
Issue Date:July 2005
pp. 434-443
<p>The increasing transient fault rate will necessitate on-chip fault tolerance techniques in future processors. The speed gap between the processor and the memory is also increasing, causing the processor to stay idle for hundreds of cycles while wa...
 
On Reusing the Results of Pre-Executed Instructions in a Runahead Execution Processor
Found in: IEEE Computer Architecture Letters
By Onur Mutlu, Hyesoon Kim, Jared Stark, Yale N. Patt
Issue Date:January 2005
pp. N/A
Previous research on runahead execution took it for granted as a prefetch-only technique. Even though the results of instructions independent of an L2 miss are correctly computed during runahead mode, previous approaches discarded those results instead of ...
 
Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery
Found in: Microarchitecture, IEEE/ACM International Symposium on
By David N. Armstrong, Hyesoon Kim, Onur Mutlu, Yale N. Patt
Issue Date:December 2004
pp. 119-128
Control and data speculation are widely used to improve processor performance. Correct speculation can reduce execution time, but incorrect speculation can lead to increased execution time and greater energy consumption.<div></div> This paper p...
 
Cache Filtering Techniques to Reduce the Negative Impact of Useless Speculative Memory References on Processor Performance
Found in: Computer Architecture and High Performance Computing, Symposium on
By Onur Mutlu, Hyesoon Kim, David N. Armstrong, Yale N. Patt
Issue Date:October 2004
pp. 2-9
High-performance processors employ aggressive speculation and prefetching techniques to increase performance. Speculative memory references caused by these techniques sometimes bring data into the caches that are not needed by correct execution. This paper...
 
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
Found in: High-Performance Computer Architecture, International Symposium on
By Onur Mutlu, Jared Stark, Chris Wilkerson, Yale N. Patt
Issue Date:February 2003
pp. 129
Today?s high performance processors tolerate long latency operations by means of out-of-order execution. However, as latencies increase, the size of the instruction window must increase even faster if we are to continue to tolerate these latencies. We have...
 
Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory
Found in: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
By Yixin Luo,Sriram Govindan,Bikash Sharma,Mark Santaniello,Justin Meza,Aman Kansal,Jie Liu,Badriddine Khessib,Kushagra Vaid,Onur Mutlu
Issue Date:June 2014
pp. 467-478
Memory devices represent a key component of datacenter total cost of ownership (TCO), and techniques used to reduce errors that occur on these devices increase this cost. Existing approaches to providing reliability for memory devices pessimistically treat...
 
MISE: Providing performance predictability and improving fairness in shared main memory systems
Found in: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
By Lavanya Subramanian,Vivek Seshadri,Yoongu Kim,Ben Jaiyen,Onur Mutlu
Issue Date:February 2013
pp. 639-650
Applications running concurrently on a multicore system interfere with each other at the main memory. This interference can slow down different applications differently. Accurately estimating the slow down of each application in such a system can enable me...
 
Application-to-core mapping policies to reduce memory system interference in multi-core systems
Found in: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
By Reetuparna Das,Rachata Ausavarungnirun,Onur Mutlu,Akhilesh Kumar,Mani Azimi
Issue Date:February 2013
pp. 107-118
Future many-core processors are likely to concurrently execute a large number of diverse applications. How these applications are mapped to cores largely determines the interference between these applications in critical shared hardware resources. This pap...
 
Tiered-latency DRAM: A low latency and low cost DRAM architecture
Found in: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
By Donghyuk Lee,Yoongu Kim,Vivek Seshadri,Jamie Liu,Lavanya Subramanian,Onur Mutlu
Issue Date:February 2013
pp. 615-626
The capacity and cost-per-bit of DRAM have historically scaled to satisfy the needs of increasingly large and complex computer systems. However, DRAM latency has remained almost constant, making memory latency the performance bottleneck in today's systems....
 
Row buffer locality aware caching policies for hybrid memories
Found in: 2012 IEEE 30th International Conference on Computer Design (ICCD 2012)
By HanBin Yoon,Justin Meza,Rachata Ausavarungnirun,Rachael A. Harding,Onur Mutlu
Issue Date:September 2012
pp. 337-344
Phase change memory (PCM) is a promising technology that can offer higher capacity than DRAM. Unfortunately, PCM's access latency and energy are higher than DRAM's and its endurance is lower. Many DRAM-PCM hybrid memory systems use DRAM as a cache to PCM, ...
 
Flash correct-and-refresh: Retention-aware error management for increased flash memory lifetime
Found in: 2012 IEEE 30th International Conference on Computer Design (ICCD 2012)
By Yu Cai,Gulay Yalcin,Onur Mutlu,Erich F. Haratsch,Adrian Cristal,Osman S. Unsal,Ken Mai
Issue Date:September 2012
pp. 94-101
With the continued scaling of NAND flash and multi-level cell technology, flash-based storage has gained widespread use in systems ranging from mobile platforms to enterprise servers. However, the robustness of NAND flash cells is an increasing concern, es...
 
Thread Cluster Memory Scheduling
Found in: IEEE Micro
By Yoongu Kim, Michael Papamichael, Onur Mutlu, Mor Harchol-Balter
Issue Date:January 2011
pp. 78-89
<p>Memory schedulers in multicore systems should carefully schedule memory requests from different threads to ensure high system performance and fair, fast progress of each thread. No existing memory scheduler provides both the highest system perform...
 
A&#x00E9;rgia: A Network-on-Chip Exploiting Packet Latency Slack
Found in: IEEE Micro
By Reetuparna Das, Onur Mutlu, Thomas Moscibroda, Chita R. Das
Issue Date:January 2011
pp. 29-41
<p>A traditional Network-on-Chip (NoC) employs simple arbitration strategies, such as round robin or oldest first, which treat packets equally regardless of the source applications' characteristics. This is suboptimal because packets can have differe...
 
Data Marshaling for Multicore Systems
Found in: IEEE Micro
By M. Aater Suleman, Onur Mutlu, Jose A. Joao, Khubaib Khubaib, Yale N. Patt
Issue Date:January 2011
pp. 56-64
<p>Dividing a program into segments and executing each segment at the core best suited to run it can improve performance and save power. When consecutive segments run on different cores, accesses to intersegment data incur cache misses. Data Marshali...
 
Prefetch-Aware Memory Controllers
Found in: IEEE Transactions on Computers
By Chang Joo Lee,Onur Mutlu,Veynu Narasiman,Yale N. Patt
Issue Date:October 2011
pp. 1406-1430
Existing DRAM controllers employ rigid, nonadaptive scheduling and buffer management policies when servicing prefetch requests. Some controllers treat prefetches the same as demand requests, and others always prioritize demands over prefetches. However, no...
 
QuaLe: A Quantum-Leap Inspired Model for Non-stationary Analysis of NoC Traffic in Chip Multi-processors
Found in: Networks-on-Chip, International Symposium on
By Paul Bogdan, Miray Kas, Radu Marculescu, Onur Mutlu
Issue Date:May 2010
pp. 241-248
This paper identifies non-stationary effects in grid like Network-on-Chip (NoC) traffic and proposes QuaLe, a novel statistical physics-inspired model, that can account for non-stationarity observed in packet arrival processes. Using a wide set of real app...
 
Accelerating Critical Section Execution with Asymmetric Multicore Architectures
Found in: IEEE Micro
By M. Aater Suleman, Onur Mutlu, Moinuddin K. Qureshi, Yale N. Patt
Issue Date:January 2010
pp. 60-70
<p>Contention for critical sections can reduce performance and scalability by causing thread serialization. The proposed accelerated critical sections mechanism reduces this limitation. ACS executes critical sections on the high-performance core of a...
 
Phase-Change Technology and the Future of Main Memory
Found in: IEEE Micro
By Benjamin C. Lee, Ping Zhou, Jun Yang, Youtao Zhang, Bo Zhao, Engin Ipek, Onur Mutlu, Doug Burger
Issue Date:January 2010
pp. 143-143
<p>Phase-change memory may enable continued scaling of main memories, but PCM has higher access latencies, incurs higher power costs, and wears out more quickly than DRAM. This article discusses how to mitigate these limitations through buffer sizing...
 
A Flexible Software-Based Framework for Online Detection of Hardware Defects
Found in: IEEE Transactions on Computers
By Kypros Constantinides, Onur Mutlu, Todd Austin, Valeria Bertacco
Issue Date:August 2009
pp. 1063-1079
This work proposes a new, software-based, defect detection and diagnosis technique. We introduce a novel set of instructions, called Access-Control Extensions (ACE), that can access and control the microprocessor's internal state. Special firmware periodic...
 
Virtual Program Counter (VPC) Prediction: Very Low Cost Indirect Branch Prediction Using Conditional Branch Prediction Hardware
Found in: IEEE Transactions on Computers
By Hyesoon Kim, José A. Joao, Onur Mutlu, Chang Joo Lee, Yale N. Patt, Robert Cohn
Issue Date:September 2009
pp. 1153-1170
Indirect branches have become increasingly common in modular programs written in modern object-oriented languages and virtual-machine-based runtime systems. Unfortunately, the prediction accuracy of indirect branches has not improved as much as that of con...
 
Guest Editors' Introduction: Interaction of Many-Core Computer Architecture and Operating Systems
Found in: IEEE Micro
By Sangyeun Cho, Tao Li, Onur Mutlu
Issue Date:May 2008
pp. 2-5
Rapid changes in platform hardware resources with the evolution of many-core architectures will require a fundamental reexamination of mainstream system-software design decisions to support multiple cores and to efficiently manage on-chip hardware resource...
 
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
Found in: Computer Architecture, International Symposium on
By Engin Ipek, Onur Mutlu, José F. Martínez, Rich Caruana
Issue Date:June 2008
pp. 39-50
Efficiently utilizing off-chip DRAM bandwidth is a critical issuein designing cost-effective, high-performance chip multiprocessors(CMPs). Conventional memory controllers deliver relativelylow performance in part because they often employ fixed,rigid acces...
 
Diverge-Merge Processor: Generalized and Energy-Efficient Dynamic Predication
Found in: IEEE Micro
By Hyesoon Kim, José A. Joao, Onur Mutlu, Yale N. Patt
Issue Date:January 2007
pp. 94-104
The branch misprediction penalty is a major performance limiter and a major cause of wasted energy in high-performance processors. The diverge-merge processor reduces this penalty by dynamically predicating a wide range of hard-to-predict branches at runti...
 
Address-Value Delta (AVD) Prediction: A Hardware Technique for Efficiently Parallelizing Dependent Cache Misses
Found in: IEEE Transactions on Computers
By Onur Mutlu, Hyesoon Kim, Yale N. Patt
Issue Date:December 2006
pp. 1491-1508
While runahead execution is effective at parallelizing independent long-latency cache misses, it is unable to parallelize dependent long-latency cache misses. To overcome this limitation, this paper proposes a novel hardware technique, address-value delta ...
 
Wish Branches: Enabling Adaptive and Aggressive Predicated Execution
Found in: IEEE Micro
By Hyesoon Kim, Onur Mutlu, Yale N. Patt, Jared Stark
Issue Date:January 2006
pp. 48-58
The goal of wish branches is to use predicated execution for hard-to-predict dynamic branches, and branch prediction for easy-to-predict dynamic branches, thereby obtaining the best of both worlds. Wish loops, one class of wish branches, use predication to...
 
An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors
Found in: IEEE Transactions on Computers
By Onur Mutlu, Hyesoon Kim, David N. Armstrong, Yale N. Patt
Issue Date:December 2005
pp. 1556-1571
High-performance, out-of-order execution processors spend a significant portion of their execution time on the incorrect program path even though they employ aggressive branch prediction algorithms. Although memory references generated on the wrong path do...
 
The Dirty-Block Index
Found in: 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)
By Vivek Seshadri,Abhishek Bhowmick,Onur Mutlu,Phillip B. Gibbons,Michael A. Kozuch,Todd C. Mowry
Issue Date:June 2014
pp. 157-168
On-chip caches maintain multiple pieces of metadata about each cached block—e.g., dirty bit, coherence information, ECC. Traditionally, such metadata for each block is stored in the corresponding tag entry in the tag store. While this approach is simple to...
   
Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors
Found in: 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)
By Yoongu Kim,Ross Daly,Jeremie Kim,Chris Fallin,Ji Hye Lee,Donghyuk Lee,Chris Wilkerson,Konrad Lai,Onur Mutlu
Issue Date:June 2014
pp. 361-372
Memory isolation is a key property of a reliable and secure computing system—an access to one memory address should not have unintended side effects on data stored in other addresses. However, as DRAM process technology scales down to smaller dimensions, i...
   
Improving DRAM performance by parallelizing refreshes with accesses
Found in: 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)
By Kevin Kai-Wei Chang,Donghyuk Lee,Zeshan Chishti,Alaa R. Alameldeen,Chris Wilkerson,Yoongu Kim,Onur Mutlu
Issue Date:February 2014
pp. 356-367
Modern DRAM cells are periodically refreshed to prevent data loss due to leakage. Commodity DDR (double data rate) DRAM refreshes cells at the rank level. This degrades performance significantly because it prevents an entire DRAM rank from serving memory r...
   
 1  2 Next >>