Search For:

Displaying 1-50 out of 86 total
Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures
Found in: Computer Architecture, International Symposium on
By M. S. Hrishikesh, Stephen W. Keckler, Doug Burger, Vikas Agarwal
Issue Date:June 2000
pp. 248
The doubling of microprocessor performance every three years has been the result of two factors: more transistors per chip and superlinear scali ng of the processor clock with technology generation. Our results show that, due to both diminishing improvemen...
 
Phase change memory architecture and the quest for scalability
Found in: Communications of the ACM
By Benjamin C. Lee, Doug Burger, Doug Burger, Doug Burger, Engin Ipek, Engin Ipek, Engin Ipek, Onur Mutlu, Onur Mutlu, Onur Mutlu
Issue Date:July 2010
pp. 99-106
Memory scaling is in jeopardy as charge storage and sensing mechanisms become less reliable for prevalent memory technologies, such as dynamic random access memory (DRAM). In contrast, phase change memory (PCM) relies on programmable resistances, as well a...
     
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design
Found in: High-Performance Computer Architecture, International Symposium on
By Wi-fen Lin, Steven K. Reinhardt, Doug Burger
Issue Date:January 2001
pp. 0301
Abstract: In this paper,we address the severe performance gap caused by high processor clock rates and slow DRAM accesses.We show that even with an aggressive,next-generation memory system using four Direct Rambus channels and an integrated one-megabyte le...
 
Charles R. (Chuck) Moore (1961—2012)
Found in: IEEE Micro
By Doug Burger,Stephen W. Keckler,Mark Papermaster
Issue Date:July 2012
pp. 3-5
This column discusses the life and achievements of Chuck Moore and his impact on the computer architecture community.
 
The Future of Architectural Simulation
Found in: IEEE Micro
By James C. Hoe, Doug Burger, Joel Emer, Derek Chiou, Resit Sendag, Joshua Yi
Issue Date:May 2010
pp. 8-18
<p>Simulation is an indispensable tool for evaluation and analysis throughout the development cycle of a computer system, and even after the computer system is built. How simulation should evolve as the complexity of computer systems continues to gro...
 
Speculative Incoherent Cache Protocols
Found in: IEEE Micro
By Jaehyuk Huh, Doug Burger, Jichuan Chang, Gurindar S. Sohi
Issue Date:November 2004
pp. 104-109
Coherence decoupling is a microarchitectural mechanism that implements separate protocols for speculative use and for the eventual verification of values. The technique reduces the effect of long communication latencies while mitigating the burdens on the ...
 
Scalable Hardware Memory Disambiguation for High-ILP Processors
Found in: IEEE Micro
By Simha Sethumadhavan, Rajagopalan Desikan, Doug Burger, Charles R. Moore, Stephen W. Keckler
Issue Date:November 2004
pp. 118-127
Power is a major problem for scaling the hardware needed to support memory disambiguation in future out-of-order architectures. In current machines, the traditional detection of memory ordering violations requires frequent associative searches of state pro...
 
Billion-Transistor Architectures: There and Back Again
Found in: Computer
By Doug Burger, James R. Goodman
Issue Date:March 2004
pp. 22-28
In September 1997, Computer published a special issue on billion-transistor microprocessor architectures. Comparing that issue's predictions about the trends that would drive architectural development with the factors that subsequently emerged shows a grea...
 
Filtering Superfluous Prefetches Using Density Vectors
Found in: Computer Design, International Conference on
By Doug Burger, Thomas R. Puzak, Wei-Fen Lin, Steven K. Reinhardt
Issue Date:September 2001
pp. 0124
Abstract: A previous evaluation of scheduled region prefetching showed that this technique eliminates the bulk of main-memory stall time for applications with spatial locality. The downside to that aggressive prefetching scheme is that, even when it succes...
 
Measuring Experimental Error in Microprocessor Simulation
Found in: Computer Architecture, International Symposium on
By Rajagopalan Desikan, Doug Burger, Stephen W. Keckler
Issue Date:July 2001
pp. 0266
Abstract: We measure the experimental error that arises from the use of non-validated simulators in computer architecture research, with the goal of increasing the rigor of simulation- based studies. We describe the methodology that we used to validate a m...
 
What the Future Holds for Solid-State Memory
Found in: Computer
By Karin Strauss,Doug Burger
Issue Date:January 2014
pp. 24-31
The memory industry faces significant disruption due to challenges related to scaling. Future memory systems will have more heterogeneity at individual levels of the hierarchy, with management support from multiple layers across the stack.
 
Multicore Model from Abstract Single Core Inputs
Found in: IEEE Computer Architecture Letters
By Emily Blem,Hadi Esmaeilzadeh,Renee St. Amant,Karthikeyan Sankaralingam,Doug Burger
Issue Date:July 2013
pp. 59-62
This paper describes a first order multicore model to project a tighter upper bound on performance than previous Amdahl's Law based approaches. The speedup over a known baseline is a function of the core performance, microarchitectural features, applicatio...
 
Neural Acceleration for General-Purpose Approximate Programs
Found in: IEEE Micro
By Hadi Esmaeilzadeh,Adrian Sampson,Luis Ceze,Doug Burger
Issue Date:May 2013
pp. 16-27
This work proposes an approximate algorithmic transformation and a new class of accelerators, called neural processing units (NPUs). NPUs leverage the approximate algorithmic transformation that converts regions of code from a Von Neumann model to a neural...
 
Reconfigurable computing in the era of post-silicon scaling [panel discussion]
Found in: IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM 2013)
By Eric Chung,Doug Burger,Mike Butts,Jan Gray,Chuck Thacker,Kees Vissers,John Wawrzynek
Issue Date:April 2013
pp. xvi
Summary form only given, as follows. Although transistor densities continue to scale exponentially, the failure of Dennard Scaling prevents us from maximally utilizing die area in future power-constrained multicore processors-a phenomenon referred to as
   
How to implement effective prediction and forwarding for fusable dynamic multicore architectures
Found in: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
By Behnam Robatmili,Dong Li,Hadi Esmaeilzadeh,Sibi Govindan,Aaron Smith,Andrew Putnam,Doug Burger,Stephen W. Keckler
Issue Date:February 2013
pp. 460-471
Dynamic multicore architectures, that fuse and split cores at run time, potentially offer a level of performance/energy agility that static multicore designs cannot achieve. Conventional ISAs, however, have scalability limits to fusion. EDGE-based designs ...
 
Neural Acceleration for General-Purpose Approximate Programs
Found in: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
By Hadi Esmaeilzadeh,Adrian Sampson,Luis Ceze,Doug Burger
Issue Date:December 2012
pp. 449-460
This paper describes a learning-based approach to the acceleration of approximate programs. We describe the \emph{Parrot transformation}, a program transformation that selects and trains a neural network to mimic a region of imperative code. After the lear...
 
Exploiting microarchitectural redundancy for defect tolerance
Found in: 2012 IEEE 30th International Conference on Computer Design (ICCD 2012)
By Premkishore Shivakumar,Stephen W. Keckler,Charles R. Moore,Doug Burger
Issue Date:September 2012
pp. 35-42
The continued increase in microprocessor clock frequency that has come from advancements in fabrication technology and reductions in feature size, creates challenges in maintaining both manufacturing yield rates and long-term reliability of devices. Method...
 
Dark Silicon and the End of Multicore Scaling
Found in: IEEE Micro
By Hadi Esmaeilzadeh,Emily Blem,Renee St. Amant,Karthikeyan Sankaralingam,Doug Burger
Issue Date:May 2012
pp. 122-134
A key question for the microprocessor research and design community is whether scaling multicores will provide the performance and value needed to scale down many more technology generations. To provide a quantitative answer to this question, a comprehensi...
 
Panel statement
Found in: 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011)
By Per Stenström,Doug Burger,Wen-mei Hwu,Vipin Kumar,Kunle Olukotun,David Padua,Burton Smith
Issue Date:May 2011
pp. 877
No summary available.
 
The Good Block: Hardware/Software Design for Composable, Block-Atomic Processors
Found in: Interaction between Compilers and Computer Architecture, Annual Workshop on
By Bertrand A. Maher, Katherine E. Coons, Kathryn S. McKinley, Doug Burger
Issue Date:February 2011
pp. 9-16
Power consumption, complexity, and on-chip latency are forcing  computer systems to exploit more parallelism efficiently. Explicit  Dataflow Graph Execution (EDGE) architectures seek to expose parallelism by dividing programs into blocks of efficient d...
 
Exploiting criticality to reduce bottlenecks in distributed uniprocessors
Found in: High-Performance Computer Architecture, International Symposium on
By Behnam Robatmili, Sibi Govindan, Doug Burger, Stephen W. Keckler
Issue Date:February 2011
pp. 431-442
Composable multicore systems merge multiple independent cores for running sequential single-threaded workloads. The performance scalability of these systems, however, is limited due to partitioning overheads. This paper addresses two of the key performance...
 
Phase-Change Technology and the Future of Main Memory
Found in: IEEE Micro
By Benjamin C. Lee, Ping Zhou, Jun Yang, Youtao Zhang, Bo Zhao, Engin Ipek, Onur Mutlu, Doug Burger
Issue Date:January 2010
pp. 143-143
<p>Phase-change memory may enable continued scaling of main memories, but PCM has higher access latencies, incurs higher power costs, and wears out more quickly than DRAM. This article discusses how to mitigate these limitations through buffer sizing...
 
End-to-end validation of architectural power models
Found in: Low Power Electronics and Design, International Symposium on
By Madhu Saravana Sibi Govindan, Stephen W. Keckler, Doug Burger
Issue Date:August 2009
pp. 383-388
While researchers have invested substantial effort to build architectural power models, validating such models has proven difficult at best. In this paper, we examine the accuracy of commonly used architectural power models using the TRIPS system as a case...
 
Mixed-Signal Approximate Computation: A Neural Predictor Case Study
Found in: IEEE Micro
By Renée St. Amant, Daniel A. Jiménez, Doug Burger
Issue Date:January 2009
pp. 104-115
<p>As transistors shrink and processors trend toward low power, maintaining precise digital behavior grows more expensive. Replacing digital units with analog equivalents sometimes allows similar computation to be performed at higher speed using less...
 
Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Haiming Liu, Michael Ferdman, Jaehyuk Huh, Doug Burger
Issue Date:November 2008
pp. 222-233
Data caches in general-purpose microprocessors often contain mostly dead blocks and are thus used inefficiently. To improve cache efficiency, dead blocks should be identified and evicted early. Prior schemes predict the death of a block immediately after i...
 
Low-power, high-performance analog neural branch prediction
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Renee St. Amant, Daniel A. Jimenez, Doug Burger
Issue Date:November 2008
pp. 447-458
Shrinking transistor sizes and a trend toward low-power processors have caused increased leakage, high per-device variation and a larger number of hard and soft errors. Maintaining precise digital behavior on these devices grows more expensive with each te...
 
Strategies for mapping dataflow blocks to distributed hardware
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Behnam Robatmili, Katherine E. Coons, Doug Burger, Kathryn S. McKinley
Issue Date:November 2008
pp. 23-34
Distributed processors must balance communication and concurrency. When dividing instructions among the processors, key factors are the available concurrency, criticality of dependence chains, and communication penalties. The amount of concurrency determin...
 
Counting Dependence Predictors
Found in: Computer Architecture, International Symposium on
By Franziska Roesner, Doug Burger, Stephen W. Keckler
Issue Date:June 2008
pp. 215-226
Modern processors rely on memory dependence prediction to execute load instructions as early as possible, speculating that they are not dependent on an earlier, unissued store. To date, the most sophisticated dependence predictors, such as Store Sets, have...
 
Composable Lightweight Processors
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Changkyu Kim, Simha Sethumadhavan, M.S. Govindan, Nitya Ranganathan, Divya Gulati, Doug Burger, Stephen W. Keckler
Issue Date:December 2007
pp. 381-394
Modern chip multiprocessors (CMPs) are designed to exploit both instruction-level parallelism (ILP) within pro- cessors and thread-level parallelism (TLP) within and across processors. However, the number of processors and the granularity of each processor...
 
On-Chip Interconnection Networks of the TRIPS Chip
Found in: IEEE Micro
By Paul Gratz, Changkyu Kim, Karthikeyan Sankaralingam, Heather Hanson, Premkishore Shivakumar, Stephen W. Keckler, Doug Burger
Issue Date:September 2007
pp. 41-50
The TRIPS chip prototypes two networks on chip to demonstrate the viability of a routed interconnection fabric for memory and operand traffic. In a 170-million-transistor custom ASIC chip, these NoCs provide system performance within 28 percent of ideal no...
 
A NUCA Substrate for Flexible CMP Cache Sharing
Found in: IEEE Transactions on Parallel and Distributed Systems
By Jaehyuk Huh, Changkyu Kim, Hazim Shafi, Lixin Zhang, Doug Burger, Stephen W. Keckler
Issue Date:August 2007
pp. 1028-1040
<p><b>Abstract</b>—We propose an organization for the on-chip memory system of a chip multiprocessor in which 16 processors share a 16-Mbyte pool of 64 level-2 (L2) cache banks. The L2 cache is organized as a nonuniform cache architecture...
 
Implementation and Evaluation of a Dynamically Routed Processor Operand Network
Found in: Networks-on-Chip, International Symposium on
By Paul Gratz, Karthikeyan Sankaralingam, Heather Hanson, Premkishore Shivakumar, Robert McDonald, Stephen W. Keckler, Doug Burger
Issue Date:May 2007
pp. 7-17
Microarchitecturally integrated on-chip networks, or micronets, are candidates to replace busses for processor component interconnect in future processor designs. For micronets, tight coupling between processor microarchitecture and network architecture is...
 
Dataflow Predication
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Aaron Smith, Ramadass Nagarajan, Karthikeyan Sankaralingam, Robert McDonald, Doug Burger, Stephen W. Keckler, Kathryn S. McKinley
Issue Date:December 2006
pp. 89-102
Predication facilitates high-bandwidth fetch and large static scheduling regions, but has typically been too complex to implement comprehensively in out-of-ordermicroarchitectures. This paper describes dataflow predication, which provides per-instruction p...
 
Merging Head and Tail Duplication for Convergent Hyperblock Formation
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Bertrand A. Maher, Aaron Smith, Doug Burger, Kathryn S. McKinley
Issue Date:December 2006
pp. 65-76
<p>VLIW and EDGE (Explicit Data Graph Execution) ar- chitectures rely on compilers to form high-quality hyper- blocks for good performance. These compilers typically perform hyperblock formation, loop unrolling, and scalar optimizations in a fixed or...
 
Compiling for EDGE Architectures
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Aaron Smith, Jon Gibson, Bertrand Maher, Nick Nethercote, Bill Yoder, Doug Burger, Kathryn S. McKinle, Jim Burrill
Issue Date:March 2006
pp. 185-195
<p>Explicit Data Graph Execution (EDGE) architectures offer the possibility of high instruction-level parallelism with energy efficiency. In EDGE architectures, the compiler breaks a program into a sequence of structured blocks that the hardware exec...
 
Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Ramadass Nagarajan, Sundeep K. Kushwaha, Doug Burger, Kathryn S. McKinley, Calvin Lin, Stephen W. Keckler
Issue Date:October 2004
pp. 74-84
Technology trends present new challenges for processor architectures and their instruction schedulers. Growing transistor density will increase the number of execution units on a single chip, and decreasing wire transmission speeds will cause long and vari...
 
Scaling to the End of Silicon with EDGE Architectures
Found in: Computer
By Doug Burger, Stephen W. Keckler, Kathryn S. McKinley, Mike Dahlin, Lizy K. John, Calvin Lin, Charles R. Moore, James Burrill, Robert G. McDonald, William Yoder, the TRIPS Team
Issue Date:July 2004
pp. 44-55
Post-RISC microprocessor designs must introduce new ISAs to address the challenges that modern CMOS technologies pose while also exploiting the massive levels of integration now possible. To meet these challenges, the TRIPS Team at the University of Texas ...
 
Universal Mechanisms for Data-Parallel Architectures
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Karthikeyan Sankaralingam, Stephen W. Keckler, William R. Mark, Doug Burger
Issue Date:December 2003
pp. 303
Data-parallel programs are both growing in importance and increasing in diversity, resulting in specialized processors targeted at specific classes of these programs. This paper presents a classification scheme for data-parallel program attributes, and pro...
 
Scalable Hardware Memory Disambiguation for High ILP Processors
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Simha Sethumadhavan, Rajagopalan Desikan, Doug Burger, Charles R. Moore, Stephen W. Keckler
Issue Date:December 2003
pp. 399
This paper describes several methods for improving the scalability of memory disambiguation hardware for future high ILP processors. As the number of in-flight instructions grows with issue width and pipeline depth, the load/store queues (LSQ) threaten to ...
 
Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture
Found in: IEEE Micro
By Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, Charles Moore
Issue Date:November 2003
pp. 46-51
<p>The TRIPS architecture seeks to deliver system-level configurability to applications and runtime systems. It does so by employing the concept of polymorphism, which permits the runtime system to configure the hardware execution resources to match ...
 
Nonuniform Cache Architectures for Wire-Delay Dominated On-Chip Caches
Found in: IEEE Micro
By Changkyu Kim, Doug Burger, Stephen W. Keckler
Issue Date:November 2003
pp. 99-107
<p>Nonuniform cache access designs solve the on-chip wire delay problem for future large integrated caches. By embedding a network in the cache, NUCA designs let data migrate within the cache, clustering the working set nearest the processor.</p&g...
 
Routed Inter-ALU Networks for ILP Scalability and Performance
Found in: Computer Design, International Conference on
By Karthikeyan Sankaralingam, Vincent Ajay Singh, Stephen W. Keckler, Doug Burger
Issue Date:October 2003
pp. 170
Modern processors rely heavily on broadcast networks to bypass instruction results to dependent instructions in the pipeline. However, as clock rates increase, architectures get wider, and pipelines get deeper, broadcasting becomes more complex, slower, an...
 
Exploiting Microarchitectural Redundancy For Defect Tolerance
Found in: Computer Design, International Conference on
By Premkishore Shivakumar, Stephen W. Keckler, Charles R. Moore, Doug Burger
Issue Date:October 2003
pp. 481
The continued increase in microprocessor clock frequency that has come from advancements in fabrication technology and reductions in feature size, creates challenges in maintaining both manufacturing yield rates and long-term reliability of devices. Method...
 
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements
Found in: IEEE Transactions on Computers
By Deepu Talla, Lizy Kurian John, Doug Burger
Issue Date:August 2003
pp. 1015-1031
<p><b>Abstract</b>—Multimedia SIMD extensions such as MMX and AltiVec speed up media processing; however, our characterization shows that the attributes of current general-purpose processors enhanced with SIMD extensions do not match very...
 
Guided Region Prefetching: A Cooperative Hardware/Software Approach
Found in: Computer Architecture, International Symposium on
By Zhenlin Wang, Doug Burger, Kathryn S. McKinley, Steven K. Reinhardt, Charles C. Weems
Issue Date:June 2003
pp. 388
Despite large caches, main-memory access latencies still cause significant performance losses in many applications. Numerous hardware and software prefetching schemes have been proposed to tolerate these latencies. Software prefetching typically provides b...
 
Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture
Found in: Computer Architecture, International Symposium on
By Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, Charles R. Moore
Issue Date:June 2003
pp. 422
This paper describes the polymorphous TRIPS architecture which can be configured for different granularities and types of parallelism. TRIPS contains mechanisms that enable the processing cores and the on-chip memory system to be configured and combined in...
 
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic
Found in: Dependable Systems and Networks, International Conference on
By Premkishore Shivakumar, Michael Kistler, Stephen W. Keckler, Doug Burger, Lorenzo Alvisi
Issue Date:June 2002
pp. 389
This paper examines the effect of technology scaling and microarchitectural trends on the rate of soft errors in CMOS memory and logic circuits. We describe and validate an end-to-end model that enables us to compute the soft error rates (SER) for existing...
 
The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays
Found in: Computer Architecture, International Symposium on
By M. S. Hrishikesh, Doug Burger, Stephen W. Keckler, Premkishore Shivakumar, Norman P. Jouppi, Keith I. Farkas
Issue Date:May 2002
pp. 0014
Microprocessor clock frequency has improved by nearly 40% annually over the past decade. This improvement has been provided, in equal measure, by smaller technologies and deeper pipelines. From our study of the SPEC 2000 benchmarks, we find that for a high...
 
A Design Space Evaluation of Grid Processor Architectures
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Ramadass Nagarajan, Karthikeyan Sankaralingam, Doug Burger, Stephen W. Keckler
Issue Date:December 2001
pp. 40
In this paper, we survey the design space of a new class of architectures called Grid Processor Architectures (GPAs). These architectures are designed to scale with technology, allowing faster clock rates than conventional architectures while providing sup...
 
Designing a Modern Memory Hierarchy with Hardware Prefetching
Found in: IEEE Transactions on Computers
By Wei-Fen Lin, Steven K. Reinhardt, Doug Burger
Issue Date:November 2001
pp. 1202-1218
<p><b>Abstract</b>—In this paper, we address the severe performance gap caused by high processor clock rates and slow DRAM accesses. We show that, even with an aggressive, next-generation memory system using four Direct Rambus channels an...
 
 1  2 Next >>