Search For:

Displaying 1-50 out of 59 total
UNISIM: An Open Simulation Environment and Library for Complex Architecture Design and Collaborative Development
Found in: IEEE Computer Architecture Letters
By David August, Jonathan Chang, Sylvain Girbal, Daniel Gracia-Perez, Gilles Mouchard, David A. Penry, Olivier Temam, Neil Vachharajani
Issue Date:July 2007
pp. 45-48
Simulator development is already a huge burden for many academic and industry research groups; future complex or heterogeneous multi-cores, as well as the multiplicity of performance metrics and required functionality, will make matters worse. We present a...
 
Challenges in Computer Architecture Evaluation
Found in: Computer
By Kevin Skadron, Margaret Martonosi, David I. August, Mark D. Hill, David J. Lilja, Vijay S. Pai
Issue Date:August 2003
pp. 30-36
<p>Reasoning about today's tremendously complex computer systems is difficult and developing them is expensive. Detailed software simulations are thus essential for evaluating computer architecture ideas. Industry uses simulation extensively during p...
 
Optimizations for a Simulator Construction System Supporting Reusable Components
Found in: Design Automation Conference
By David A. Penry, David I. August
Issue Date:June 2003
pp. 926
Exploring a large portion of the microprocessor design space requires the rapid development of efficient simulators. While some systems support rapid model development through the structural composition of reusable concurrent components, the Liberty Simula...
 
Microarchitectural Exploration with Liberty
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Manish Vachharajani, Neil Vachharajani, David A. Penry, Jason A. Blome, David I. August
Issue Date:November 2002
pp. 271
To find the best designs, architects must rapidly simulate many design alternatives and have confidence in the results. Unfortunately, the most prevalent simulator construction methodology, hand-writing monolithic simulators in sequential programming langu...
 
Parallelizing Sequential Code
Found in: IEEE Micro
By David I. August
Issue Date:July 2012
pp. 6-7
This introduction to the special issue discusses developments in the area of automatic parallelization of sequential code.
 
Programming Multicores: Do Applications Programmers Need to Write Explicitly Parallel Programs?
Found in: IEEE Micro
By David August, Keshav Pingali, Derek Chiou, Resit Sendag, Joshua J. Yi
Issue Date:May 2010
pp. 19-33
<p>In this panel discussion from the 2009 Workshop on Computer Architecture Research Directions, David August and Keshav Pingali debate whether explicitly parallel programming is a necessary evil for applications programmers, assess the current state...
 
Revisiting the Sequential Programming Model for the Multicore Era
Found in: IEEE Micro
By Matthew J. Bridges, Neil Vachharajani, Yun Zhang, Thomas Jablin, David I. August
Issue Date:January 2008
pp. 12-20
Automatic parallelization has thus far not been successful at extracting scalable parallelism from general programs. An aggressive automatic thread extraction framework, coupled with natural extensions to the sequential programming model that allow for a r...
 
Automatic Instruction-Level Software-Only Recovery
Found in: IEEE Micro
By George A. Reis, Jonathan Chang, David I. August
Issue Date:January 2007
pp. 36-47
Software-only reliability techniques protect against transient faults without the overhead of hardware techniques. However, although existing low-level software-only fault-tolerance techniques detect faults, they offer no recovery assistance. This article ...
 
Automatically exploiting cross-invocation parallelism using runtime information
Found in: 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
By Jialu Huang,Thomas B. Jablin,Stephen R. Beard,Nick P. Johnson,David I. August
Issue Date:February 2013
pp. 1-11
Automatic parallelization is a promising approach to producing scalable multi-threaded programs for multicore architectures. Many existing automatic techniques only parallelize iterations within a loop invocation and synchronize threads at the end of each ...
 
The SPARCHS Project: Hardware Support for Software Security
Found in: SysSec Workshop
By Simha Sethumadhavan,Salvatore J. Stolfo,Angelos Keromytis,Junfeng Yang,David August
Issue Date:July 2011
pp. 119-122
This paper describes the SPARCHS project at Columbia and Princeton Universities. Drawing inspiration from biological defenses, this project aims to enhance security with clean-slate design of hardware. The ideas to be explored in the project and current st...
 
Scalable Speculative Parallelization on Commodity Clusters
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Hanjun Kim, Arun Raman, Feng Liu, Jae W. Lee, David I. August
Issue Date:December 2010
pp. 3-14
While clusters of commodity servers and switches are the most popular form of large-scale parallel computers, many programs are not easily parallelized for execution upon them. In particular, high inter-node communication cost and lack of globally shared m...
 
Global Multi-Threaded Instruction Scheduling
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Guilherme Ottoni, David August
Issue Date:December 2007
pp. 56-68
Recently, the microprocessor industry has moved toward chip multiprocessor (CMP) designs as a means of utiliz- ing the increasing transistor counts in the face of physi- cal and micro-architectural limitations. Despite this move, CMPs do not directly impro...
 
Revisiting the Sequential Programming Model for Multi-Core
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Matthew Bridges, Neil Vachharajani, Yun Zhang, Thomas Jablin, David August
Issue Date:December 2007
pp. 69-84
Single-threaded programming is already considered a complicated task. The move to multi-threaded programming only increases the complexity and cost involved in software development due to rewriting legacy code, training of the programmer, increased debuggi...
 
Speculative Decoupled Software Pipelining
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Neil Vachharajani, Ram Rangan, Easwaran Raman, Matthew J. Bridges, Guilherme Ottoni, David I. August
Issue Date:September 2007
pp. 49-59
In recent years, microprocessor manufacturers have shifted their focus from single-core to multi-core processors. To avoid burdening programmers with the responsibility of parallelizing their applications, some researchers have advocated automatic thread e...
 
Support for High-Frequency Streaming in CMPs
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Ram Rangan, Neil Vachharajani, Adam Stoler, Guilherme Ottoni, David I. August, George Z. N. Cai
Issue Date:December 2006
pp. 259-272
<p>As the industry moves toward larger-scale chip multiprocessors, the need to parallelize applications grows. High inter-thread communication delays, exacerbated by over-stressed high-latency memory subsystems and ever-increasing wire delays, requir...
 
Automatic Instruction-Level Software-Only Recovery
Found in: Dependable Systems and Networks, International Conference on
By Jonathan Chang, George A. Reis, David I. August
Issue Date:June 2006
pp. 83-92
As chip densities and clock rates increase, processors are becoming more susceptible to transient faults that can affect program correctness. Computer architects have typically addressed reliability issues by adding redundant hardware, but these techniques...
 
From Sequential Programs to Concurrent Threads
Found in: IEEE Computer Architecture Letters
By Guilherme Ottoni, Ram Rangan, Adam Stoler, Matthew J. Bridges, David I. August
Issue Date:January 2006
pp. N/A
Chip multiprocessors are of increasing importance due to recent difficulties in achieving higher clock frequencies in uniprocessors, but their success depends on finding useful work for the processor cores. This paper addresses this challenge by presenting...
 
Automatic Thread Extraction with Decoupled Software Pipelining
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Guilherme Ottoni, Ram Rangan, Adam Stoler, David I. August
Issue Date:November 2005
pp. 105-118
<p>Until recently, a steadily rising clock rate and other uniprocessor microarchitectural improvements could be relied upon to consistently deliver increasing performance for a wide range of applications. Current difficulties in maintaining this tren...
 
Design and Evaluation of Hybrid Fault-Detection Systems
Found in: Computer Architecture, International Symposium on
By George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, David I. August, Shubhendu S. Mukherjee
Issue Date:June 2005
pp. 148-159
<p>As chip densities and clock rates increase, processors are becoming more susceptible to transient faults that can affect program correctness. Up to now, system designers have primarily considered hardware-only and software-only fault-detection mec...
 
Practical and Accurate Low-Level Pointer Analysis
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Bolei Guo, Matthew J. Bridges, Spyridon Triantafyllis, Guilherme Ottoni, Easwaran Raman, David I. August
Issue Date:March 2005
pp. 291-302
Pointer analysis is traditionally performed once, early in the compilation process, upon an intermediate representation (IR) with source-code semantics. However, performing pointer analysis only once at this level imposes a phase-ordering constraint, causi...
 
SWIFT: Software Implemented Fault Tolerance
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, David I. August
Issue Date:March 2005
pp. 243-254
To improve performance and reduce power, processor designers employ advances that shrink feature sizes, lower voltage levels, reduce noise margins, and increase clock rates. However, these advances make processors more susceptible to transient faults that ...
 
RIFLE: An Architectural Framework for User-Centric Information-Flow Security
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Neil Vachharajani, Matthew J. Bridges, Jonathan Chang, Ram Rangan, Guilherme Ottoni, Jason A. Blome, George A. Reis, Manish Vachharajani, David I. August
Issue Date:December 2004
pp. 243-254
Even as modern computing systems allow the manipulation and distribution of massive amounts of information, users of these systems are unable to manage the confidentiality of their data in a practical fashion. Conventional access control security mechanism...
 
Facilitating Reuse in Hardware Models with Enhanced Type Inference
Found in: Hardware/software codesign and system synthesis, International conference on
By Manish Vachharajani, Neil Vachharajani, Sharad Malik, David I. August
Issue Date:September 2004
pp. 86-91
<p>High-level hardware modeling is an essential, yet time-consuming, part of system design. However, effective component-based reuse in hardware modeling languages can reduce model construction time and enable the exploration of more design alternati...
 
Decoupled Software Pipelining with the Synchronization Array
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Ram Rangan, Neil Vachharajani, Manish Vachharajani, David I. August
Issue Date:October 2004
pp. 177-188
Despite the success of instruction-level parallelism (ILP) optimizations in increasing the performance of microprocessors, certain codes remain elusive. In particular, codes containing recursive data structure (RDS) traversal loops have been largely immune...
 
Achieving Structural and Composable Modeling of Complex Systems
Found in: Parallel and Distributed Processing Symposium, International
By David I. August, Sharad Malik, Li-Shiuan Peh, Vijay Pai
Issue Date:April 2004
pp. 196a
This paper describes a recently-released, structural and composable modeling system called the Liberty Simulation Environment (LSE). LSE automatically constructs simulators from system descriptions that closely resemble the structure of hardware at the cho...
 
Exposing Memory Access Regularities Using Object-Relative Memory Profiling
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Qiang Wu, Artem Pyatakov, Alexey Spiridonov, Easwaran Raman, Douglas W. Clark, David I. August
Issue Date:March 2004
pp. 315
Memory profiling is the process of characterizing a program's memory behavior by observing and recording its response to specific input sets. Relevant aspects of the program's memory behavior may then be used to guide memory optimizations in an aggressivel...
 
Compiler Optimization-Space Exploration
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Spyridon Triantafyllis, Manish Vachharajani, Neil Vachharajani, David I. August
Issue Date:March 2003
pp. 204
To meet the demands of modern architectures, optimizing compilers must incorporate an ever larger number of increasingly complex transformation algorithms. Since code transformations may often degrade performance or interfere with subsequent transformation...
 
Retargetable Static Timing Analysis for Embedded Software
Found in: System Synthesis, International Symposium on
By Sharad Malik, David I. August, Kaiyu Chen
Issue Date:October 2001
pp. 39-44
This paper presents a novel approach for retargetable static software timing analysis. Specifically, we target the problem of determining bounds on the execution time of a program on modern processors, and solve this problem in a retargetable software deve...
 
The Program Decision Logic Approach to Predicated Execution
Found in: Computer Architecture, International Symposium on
By David I. August, John W. Sias, Jean-Michel Puiatti, Scott A. Mahlke, Daniel A. Connors, Kevin M. Crozier, Wen-mei W. Hwu
Issue Date:May 1999
pp. 0208
Modern compilers must expose sufficient amounts of Instruction-Level Parallelism (ILP) to achieve the promised performance increases of superscalar and VLIW processors. One of the major impediments to achieving this goal has been inefficient programmatic c...
 
Architectural Support for Compiler-Synthesized Dynamic Branch Prediction Strategies: Rationale and Initial Results
Found in: High-Performance Computer Architecture, International Symposium on
By David I. August, Daniel A. Connors, John C. Gyllenhaal, Wen-mei W. Hwu
Issue Date:February 1997
pp. 84
This paper introduces a new architectural approach that supports compiler-synthesized dynamic branch predication. In compiler-synthesized dynamic branch prediction, the compiler generates code sequences that, when executed, digest relevant state informatio...
 
A survey of the practice of computational science
Found in: State of the Practice Reports (SC '11)
By Arun Raman, David I. August, David Walker, Feng Liu, Hanjun Kim, Jialu Huang, Matthew Zoufaly, Nick P. Johnson, Prakash Prabhu, Soumyadeep Ghosh, Stephen Beard, Taewook Oh, Thomas B. Jablin, Yun Zhang
Issue Date:November 2011
pp. 1-12
Computing plays an indispensable role in scientific research. Presently, researchers in science have different problems, needs, and beliefs about computation than professional programmers. In order to accelerate the progress of science, computer scientists...
     
Fault-tolerant typed assembly language
Found in: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation (PLDI '07)
By David I. August, David Walker, Frances Perry, George A. Reis, Jay Ligatti, Lester Mackey
Issue Date:June 2007
pp. 42-53
A transient hardware fault occurs when an energetic particle strikes a transistor, causing it to change state. Although transient faults do not permanently damage the hardware, they may corrupt computations by altering stored values and signal transfers. I...
     
Static typing for a faulty lambda calculus
Found in: Proceedings of the eleventh ACM SIGPLAN international conference on Functional programming (ICFP '06)
By David I. August, David Walker, George A. Reis, Jay Ligatti, Lester Mackey
Issue Date:September 2006
pp. 127-134
A transient hardware fault occurs when an energetic particle strikes a transistor, causing it to change state. These faults do not cause permanent damage, but may result in incorrect program execution by altering signal transfers or stored values. While th...
     
The Liberty Simulation Environment: A deliberate approach to high-level system modeling
Found in: ACM Transactions on Computer Systems (TOCS)
By David A. Penry, David I. August, Jason A. Blome, Manish Vachharajani, Neil Vachharajani, Sharad Malik
Issue Date:August 2006
pp. 211-249
In digital hardware system design, the quality of the product is directly related to the number of meaningful design alternatives properly considered. Unfortunately, existing modeling methodologies and tools have properties which make them less than ideal ...
     
Optimizations for a simulator construction system supporting reusable components
Found in: Proceedings of the 40th conference on Design automation (DAC '03)
By David A. Penry, David I. August
Issue Date:June 2003
pp. 926-931
Exploring a large portion of the microprocessor design space requires the rapid development of efficient simulators. While some systems support rapid model development through the structural composition of reusable concurrent components, the Liberty Simula...
     
Accurate and Efficient Predicate Analysis with Binary Decision Diagrams
Found in: Microarchitecture, IEEE/ACM International Symposium on
By John W. Sias, Wen-mei W. Hwu, David I. August
Issue Date:December 2000
pp. 112
Functionality and performance of EPIC architectural features depend on extensive compiler support. Predication, one of these features, promises to reduce control flow overhead and to enhance optimization, provided that compilers can utilize it effectively....
 
A Comparison of Full and Partial Predicated Execution Support for ILP Processors
Found in: Computer Architecture, International Symposium on
By James E. McCormick, Richard E. Hank, Wen-Mei W. Hwu, David I. August, Scott A. Mahlke
Issue Date:June 1995
pp. 138
One can effectively utilize predicated execution to improve branch handling in instruction-level parallel processors. Although the potential benefits of predicated execution are high, the tradeoffs involved in the design of an instruction set to support pr...
 
Fast condensation of the program dependence graph
Found in: Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation (PLDI '13)
By Ayal Zaks, David I. August, Nick P. Johnson, Taewook Oh
Issue Date:June 2013
pp. 39-50
Aggressive compiler optimizations are formulated around the Program Dependence Graph (PDG). Many techniques, including loop fission and parallelization are concerned primarily with dependence cycles in the PDG. The Directed Acyclic Graph of Strongly Connec...
     
Practical automatic loop specialization
Found in: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '13)
By David I. August, Hanjun Kim, Jae W. Lee, Nick P. Johnson, Taewook Oh
Issue Date:March 2013
pp. 419-430
Program specialization optimizes a program with respect to program invariants, including known, fixed inputs. These invariants can be used to enable optimizations that are otherwise unsound. In many applications, a program input induces predictable pattern...
     
Encore: low-cost, fine-grained transient fault recovery
Found in: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44 '11)
By Amin Ansari, Scott A. Mahlke, Shantanu Gupta, David I. August, Shuguang Feng
Issue Date:December 2011
pp. 398-409
To meet an insatiable consumer demand for greater performance at less power, silicon technology has scaled to unprecedented dimensions. However, the pursuit of faster processors and longer battery life has come at the cost of reliability. Given the rise of...
     
Bundled execution of recurring traces for energy-efficient general purpose processing
Found in: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44 '11)
By Amin Ansari, Scott Mahlke, David August, Shantanu Gupta, Shuguang Feng
Issue Date:December 2011
pp. 12-23
Technology scaling has delivered on its promises of increasing device density on a single chip. However, the voltage scaling trend has failed to keep up, introducing tight power constraints on manufactured parts. In such a scenario, there is a need to inco...
     
Automatic CPU-GPU communication management and optimization
Found in: Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation (PLDI '11)
By David I. August, James A. Jablin, Nick P. Johnson, Prakash Prabhu, Stephen R. Beard, Thomas B. Jablin
Issue Date:June 2011
pp. 123-128
The performance benefits of GPU parallelism can be enormous, but unlocking this performance potential is challenging. The applicability and performance of GPU parallelizations is limited by the complexities of CPU-GPU communication. To address these commun...
     
Parallelism orchestration using DoPE: the degree of parallelism executive
Found in: Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation (PLDI '11)
By Arun Raman, David I. August, Hanjun Kim, Jae W. Lee, Taewook Oh
Issue Date:June 2011
pp. 123-128
In writing parallel programs, programmers expose parallelism and optimize it to meet a particular performance goal on a single platform under an assumed set of workload characteristics. In the field, changing workload characteristics, new parallel platform...
     
Commutative set: a language extension for implicit parallel programming
Found in: Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation (PLDI '11)
By David I. August, Nick P. Johnson, Prakash Prabhu, Soumyadeep Ghosh, Yun Zhang
Issue Date:June 2011
pp. 123-128
Sequential programming models express a total program order, of which a partial order must be respected. This inhibits parallelizing tools from extracting scalable performance. Programmer written semantic commutativity assertions provide a natural way of r...
     
DAFT: decoupled acyclic fault tolerance
Found in: Proceedings of the 19th international conference on Parallel architectures and compilation techniques (PACT '10)
By David I. August, Jae W. Lee, Nick P. Johnson, Yun Zhang
Issue Date:September 2010
pp. 87-98
Higher transistor counts, lower voltage levels, and reduced noise margin increase the susceptibility of multicore processors to transient faults. Redundant hardware modules can detect such errors, but software transient fault detection techniques are more ...
     
Speculative parallelization using software multi-threaded transactions
Found in: Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems (ASPLOS '10)
By Arun Raman, David I. August, Hanjun Kim, Thomas B. Jablin, Thomas R. Mason
Issue Date:March 2010
pp. 222-230
With the right techniques, multicore architectures may be able to continue the exponential performance trend that elevated the performance of applications of all types for decades. While many scientific programs can be parallelized without speculative tech...
     
Performance scalability of decoupled software pipelining
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By David I. August, Guilherme Ottoni, Neil Vachharajani, Ram Rangan
Issue Date:August 2008
pp. 1-25
Any successful solution to using multicore processors to scale general-purpose program performance will have to contend with rising intercore communication costs while exposing coarse-grained parallelism. Recently proposed pipelined multithreading (PMT) te...
     
Spice: speculative parallel iteration chunk execution
Found in: Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization (CGO '08)
By David I. August, Easwaran Raman, Neil Va hharajani, Ram Rangan
Issue Date:April 2008
pp. 49-54
The recent trend in the processor industry of packing multiple processor cores in a chip has increased the importance of automatic techniques for extracting thread level parallelism. A promising approach for extracting thread level parallelism in general p...
     
Parallel-stage decoupled software pipelining
Found in: Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization (CGO '08)
By Arun Raman, David I. August, Easwaran Raman, Guilherme Ottoni, Matthew J. Bridges
Issue Date:April 2008
pp. 49-54
In recent years, the microprocessor industry has embraced chip multiprocessors (CMPs), also known as multi-core architectures, as the dominant design paradigm. For existing and new applications to make effective use of CMPs, it is desirable that compilers ...
     
Communication optimizations for global multi-threaded instruction scheduling
Found in: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems (ASPLOS XIII)
By David I. August, Guilherme Ottoni
Issue Date:March 2008
pp. 1-1
The recent shift in the industry towards chip multiprocessor (CMP) designs has brought the need for multi-threaded applications to mainstream computing. As observed in several limit studies, most of the parallelization opportunities require looking for par...
     
 1  2 Next >>