Search For:

Displaying 1-50 out of 115 total
Dynamic fine-grain body biasing of caches with latency and leakage 3T1D-based monitors
Found in: Computer Design, International Conference on
By Shrikanth Ganapathy,Ramon Canal,Antonio Gonzalez,Antonio Rubio
Issue Date:October 2011
pp. 332-338
In this paper, we propose a dynamically tunable fine-grain body biasing mechanism to reduce standby leakage power in first level data-caches under process variations. Accessed physical arrays are forward body biased (FBB) to improve latency while idle (una...
 
A novel variation-tolerant 4T-DRAM cell with enhanced soft-error tolerance
Found in: 2012 IEEE 30th International Conference on Computer Design (ICCD 2012)
By Shrikanth Ganapathy,Ramon Canal,Dan Alexandrescu,Enrico Costenaro,Antonio Gonzalez,Antonio Rubio
Issue Date:September 2012
pp. 472-477
In view of device scaling issues, embedded DRAM (eDRAM) technology is being considered as a strong alternative to conventional SRAM for use in on-chip memories. Memory cells designed using eDRAM technology in addition to being logic-compatible, are variati...
 
Near-Optimal Loop Tiling by Means of Cache Miss Equations and Genetic Algorithms
Found in: Parallel Processing Workshops, International Conference on
By Jaume Abella, Antonio González, Josep Llosa, Xavier Vera
Issue Date:August 2002
pp. 568
The effectiveness of the memory hierarchy is critical for the performance of current processors. The performance of the memory hierarchy can be improved by means of program transformations such as loop tiling, which is a code transformation targeted to red...
 
Fg-STP: Fine-Grain Single Thread Partitioning on Multicores
Found in: High-Performance Computer Architecture, International Symposium on
By Rakesn Ranjan, Fernando Latorre, Pedro Marcuello, Antonio Gonzalez
Issue Date:February 2011
pp. 15-24
Power and complexity issues have led the microprocessor industry to shift to Chip Multiprocessors in order to be able to better utilize the additional transistors ensured by Moore's law. While parallel programs are going to be able to take most of the adva...
 
Compiler Directed Early Register Release
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Timothy M. Jones, Michael F.P. O'Boyle, Jaume Abella, Antonio Gonzalez, O¢guz Ergin
Issue Date:September 2005
pp. 110-122
<p>This paper presents a novel compiler directed technique to reduce the register pressure and power of the register file by releasing registers early. The compiler identifies registers that will only be read once and renames them to different logica...
 
Power- and Complexity-Aware Issue Queue Designs
Found in: IEEE Micro
By Jaume Abella, Ramon Canal, Antonio González
Issue Date:September 2003
pp. 50-58
<p>The improved performance of current microprocessors brings with it increasingly complex and power-dissipating issue logic. Recent proposals introduce a range of mechanisms for tackling this problem.</p>
 
Analyzing Data Locality in Numeric Applications
Found in: IEEE Micro
By Jesús Sánchez, Antonio González
Issue Date:July 2000
pp. 58-66
SPLAT provides programmers a fast and accurate study of memory behavior without the necessity of a costly memory simulator. The tool is suitable for use as a step in an iterative optimization process in time-consuming numeric applications.
 
A Power-Efficient Co-designed Out-of-Order Processor
Found in: Computer Architecture and High Performance Computing, Symposium on
By Abhishek Deb,Josep Maria Codina,Antonio Gonz´lez
Issue Date:October 2011
pp. 1-8
A co-designed processor helps in cutting down both the complexity and power consumption by co-designing certain key performance enablers. In this paper, we propose a FIFO based co-designed out-of-order processor. Multiple FIFOs are added in order to dynami...
 
A HW/SW Co-designed Programmable Functional Unit
Found in: IEEE Computer Architecture Letters
By Abhishek Deb,Josep Maria Codina,Antonio Gonzalez
Issue Date:January 2012
pp. 9-12
In this paper, we propose a novel programmable functional unit (PFU) to accelerate general purpose application execution on a modern out-of-order x86 processor. Code is transformed and instructions are generated that run on the PFU using a co-designed virt...
 
A Co-designed HW/SW Approach to General Purpose Program Acceleration Using a Programmable Functional Unit
Found in: Interaction between Compilers and Computer Architecture, Annual Workshop on
By Abhishek Deb, Josep Maria Codina, Antonio Gonz´lez
Issue Date:February 2011
pp. 1-8
In this paper, we propose a novel programmable functional unit (PFU) to accelerate general purpose application execution on a modern out-of-order x86 processor in a complexity-effective way. Code is transformed and instructions are generated that run on th...
 
Hardware/software-based diagnosis of load-store queues using expandable activity logs
Found in: High-Performance Computer Architecture, International Symposium on
By Javier Carretero, Xavier Vera, Jaume Abella, Tanausu Ramirez, Matteo Monchiero, Antonio Gonzalez
Issue Date:February 2011
pp. 321-331
The increasing device count and design complexity are posing significant challenges to post-silicon validation. Bug diagnosis is the most difficult step during post-silicon validation. Limited reproducibility and low testing speeds are common limitations i...
 
Reliability: Fallacy or Reality?
Found in: IEEE Micro
By Antonio González, Scott Mahlke, Shubu Mukherjee, Resit Sendag, Derek Chiou, Joshua J. Yi
Issue Date:November 2007
pp. 36-45
As chip architects and manufacturers plumb ever-smaller process technologies, new species of faults are compromising device reliability. Following an introduction by Antonio Gonz&#225;lez, Scott Mahlke and Shubu Mukherjee debate whether reliability is ...
 
Guest Editors' Introduction: Micro's Top Picks from the Microarchitecture Conferences
Found in: IEEE Micro
By Ronny Ronen, Antonio González
Issue Date:January 2007
pp. 8-11
The guest editors introduce this special issue showcasing Micro's Top Picks from the Microarchitecture Conferences of 2006. They describe the issue's intensive submission and selection process. The articles focus on the design of resilient computing system...
 
Distributed Data Cache Designs for Clustered VLIW Processors
Found in: IEEE Transactions on Computers
By Enric Gibert, Jesús Sánchez, Antonio González
Issue Date:October 2005
pp. 1227-1241
Wire delays are a major concern for current and forthcoming processors. One approach to deal with this problem is to divide the processor into semi-independent units referred to as clusters. A cluster usually consists of a local register file and a subset ...
 
Memory Bank Predictors
Found in: Computer Design, International Conference on
By Stefan Bieschewski, Joan-Manuel Parcerisa, Antonio González
Issue Date:October 2005
pp. 666-670
<p>Cache memories are commonly implemented through multiple memory banks to improve bandwidth and latency. The early knowledge of the data cache bank that an instruction will access can help to improve the performance in several ways. One scenario th...
 
Control-Flow Independence Reuse via Dynamic Vectorization
Found in: Parallel and Distributed Processing Symposium, International
By Alex Pajuelo, Antonio González, Mateo Valero
Issue Date:April 2005
pp. 21a
Current processors exploit out-of-order execution and branch prediction to improve instruction level parallelism. When a branch prediction is wrong, processors flush the pipeline and squash all the speculative work. However, control-flow independent instru...
 
Compiler Analysis for Trace-Level Speculative Multithreaded Architectures
Found in: Interaction between Compilers and Computer Architecture, Annual Workshop on
By Carlos Molina,Antonio Gonzalez,Jordi Tubella
Issue Date:February 2005
pp. 2-10
Trace-Level Speculative Multithreaded Processors exploit trace-level speculation by means of two threads working cooperatively. One thread, called the speculative thread, executes instructions ahead of the other by speculating on the result of several trac...
 
Software Directed Issue Queue Power Reduction
Found in: High-Performance Computer Architecture, International Symposium on
By Timothy M. Jones, Michael F. P. O'Boyle, Jaume Abella, Antonio González
Issue Date:February 2005
pp. 144-153
The issue logic of a superscalar processor dissipates a large amount of static and dynamic power. Furthermore, its power density makes it a hot-spot requiring expensive cooling systems and additional packaging.<div></div> In this paper we prese...
 
Software-Controlled Operand-Gating
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Ramon Canal, Antonio González, James E. Smith
Issue Date:March 2004
pp. 125
Operand gating is a technique for improving processor energy efficiency by gating off sections of the data path that are unneeded by short-precision (narrow) operands. A method for implementing software-controlled power gating is proposed and evaluated. Th...
 
Low-Complexity Distributed Issue Queue
Found in: High-Performance Computer Architecture, International Symposium on
By Jaume Abella, Antonio González
Issue Date:February 2004
pp. 73
<p>As technology evolves, power density significantly increases and cooling systems become more complex and expensive. The issue logic is one of the processor hotspots and, at the same time, its latency is crucial for the processor performance.</p...
 
Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Enric Gibert, Jesús Sánchez, Antonio González
Issue Date:December 2003
pp. 315
Wire delays are a major concern for current and forthcoming processors. One approach to attack this problem is to divide the processor into semi-independent units referred to as clusters. A cluster usually consists of a local register file and a subset of ...
 
Power Efficient Data Cache Designs
Found in: Computer Design, International Conference on
By Jaume Abella, Antonio González
Issue Date:October 2003
pp. 8
This paper investigates some power efficient data cache designs that try to significantly reduce the cache energy consumption, both static and dynamic, with a minimal impact in performance. The basic idea is to combine different threshold voltages with dif...
 
On Reducing Register Pressure and Energy in Multiple-Banked Register Files
Found in: Computer Design, International Conference on
By Jaume Abella, Antonio González
Issue Date:October 2003
pp. 14
The storage for speculative values in superscalar processors is one of the main sources of complexity and power dissipation. In this paper, we present a novel technique to reduce register requirements as well as their dynamic and static power dissipation t...
 
Optimizing Program Locality Through CMEs and GAs
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Xavier Vera, Jaume Abella, Antonio González, Josep Llosa
Issue Date:October 2003
pp. 68
<p>Caches have become increasingly important with the widening gap between main memory and processor speeds. Small and fast cache memories are designed to bridge this discrepancy. However, they are only effective when programs exhibit sufficient data...
 
Local Scheduling Techniques for Memory Coherence in a Clustered VLIW Processor with a Distributed Data Cache
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Enric Gibert, Jesús Sánchez, Antonio González
Issue Date:March 2003
pp. 193
<p>Clustering is a common technique to deal with wire delays. Fully-distributed architectures, where the register file, the functional units and the cache memory are partitioned, are particularly effective to deal with these constraints and besides t...
 
Power-Aware Control Speculation through Selective Throttling
Found in: High-Performance Computer Architecture, International Symposium on
By Juan L. Aragón, José González, Antonio González
Issue Date:February 2003
pp. 103
<p>With the constant advances in technology that lead to the increasing of the transistor count and processor frequency, power dissipation is becoming one of the major issues in high-performance processors. These processors increase their clock frequ...
 
Effective Instruction Scheduling Techniques for an Interleaved Cache Clustered VLIW Processor
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Enric Gibert, Jesús Sánchez, Antonio González
Issue Date:November 2002
pp. 123
Clustering is a common technique to overcome the wire delay problem incurred by the evolution of technology. Fully-distributed architectures, where the register file, the functional units and the data cache are partitioned, are particularly effective to de...
 
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Alex Aletà, Josep M. Codina, Jesús Sánchez, Antonio González, David Kaeli
Issue Date:September 2002
pp. 281
<p>This paper presents a new modulo scheduling algorithm for clustered microarchitectures. The main feature of the proposed scheme is that the assignment of instructions to clusters is done by means of graph partitioning algorithms that are guided by...
 
Trace-Level Speculative Multithreaded Architecture
Found in: Computer Design, International Conference on
By Carlos Molina, Antonio González, Jordi Tubella
Issue Date:September 2002
pp. 402
<p>This paper presents a novel microarchitecture to exploit trace-level speculation by means of two threads working cooperatively in a speculative and non-speculative way respectively. The architecture presents two main benefits: (a) no significant p...
 
Hardware Schemes for Early Register Release
Found in: Parallel Processing, International Conference on
By Teresa Monreal, Víctor Viñals, Antonio González, Mateo Valero
Issue Date:August 2002
pp. 5
Register files are becoming one of the critical components of current out-of-order processors in terms of delay and power consumption, since their potential to exploit instruction-level parallelism is quite related to the size and number of ports of the re...
 
Speculative Dynamic Vectorization
Found in: Computer Architecture, International Symposium on
By Alex Pajuelo, Antonio Gonzalez, Mateo Valero
Issue Date:May 2002
pp. 0271
Traditional vector architectures have shown to be very effective for regular codes where the compiler can detect data-level parallelism. However, this SIMD parallelism is also present in irregular or pointer-rich codes, for which the compiler is quite limi...
 
Selective Branch Prediction Reversal by Correlating with Data Values and Control Flow
Found in: Computer Design, International Conference on
By Juan L. Aragón, José González, José M. García, Antonio González
Issue Date:September 2001
pp. 0228
Abstract: Branch prediction is one of the main hurdles in the roadmap towards deeper pipelines and higher clock frequencies. This work presents a new approach to enhancing current branch predictors: Selective Branch Prediction Reversal. The rationale behin...
 
Energy-Effective Issue Logic
Found in: Computer Architecture, International Symposium on
By Daniele Folegnani, Antonio González
Issue Date:July 2001
pp. 0230
Abstract: The issue logic of a dynamically-scheduled superscalar processor is a complex mechanism devoted to start the execution of multiple instructions every cycle. Due to its complexity, it is responsible for a significant percentage of the energy consu...
 
Instruction Scheduling for Clustered VLIW Architectures
Found in: System Synthesis, International Symposium on
By Jesús Sánchez, Antonio González
Issue Date:September 2000
pp. 41
Clustered VLIW organizations are nowadays a common trend in the design of embedded/DSP processors. In this work, we propose a novel modulo scheduling approach for such architectures. The proposed technique performs the cluster assignment and the instructio...
 
The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures
Found in: Parallel Processing, International Conference on
By Jesús Sánchez, Antonio González
Issue Date:August 2000
pp. 555
Clustered organizations are becoming a common trend in the design of VLIW architectures. In this work, we propose a novel modulo scheduling approach for such architectures. The proposed technique performs the cluster assignment and the instruction scheduli...
 
Multiple-Banked Register File Architectures
Found in: Computer Architecture, International Symposium on
By Mateo Valero, Antonio González, Nigel P. Topham, José-Lorenzo Cruz
Issue Date:June 2000
pp. 316
The register file access time is one of the critical delays in current superscalar processors. Its impact on processor performance is likely to increase in future processor generations, as they are expected to increase the issue width (which implies more r...
 
Dynamic Cluster Assignment Mechanisms
Found in: High-Performance Computer Architecture, International Symposium on
By Ramon Canal, Joan Manuel Parcerisa, Antonio Gonzalez
Issue Date:January 2000
pp. 133
Clustered microarchitectures are an effective approach to reducing the penalties caused by wire delays inside a chip. Current superscalar processors have in fact a two-cluster microarchitecture with a naive code partitioning approach: integer instructions ...
 
Value Prediction for Speculative Multithreaded Architectures
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Pedro Marcuello, Jordi Tubella, Antonio González
Issue Date:November 1999
pp. 230
The speculative multithreading paradigm (speculative thread-level parallelism) is based on the concurrent execution of control-speculative threads. The efficiency of microarchitectures that adopt this paradigm strongly depends on the performance of the con...
 
Delaying Physical Register Allocation through Virtual-Physical Registers
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Teresa Monreal, Victor Viñals, Antonio González, Mateo Valero, José González
Issue Date:November 1999
pp. 186
Register file access time represents one of the critical delays of current microprocessors, and it is expected to become more critical as future processors increase the instruction window size and the issue width. This paper present a novel physical regist...
 
Control-Flow Speculation through Value Prediction for Superscalar Processors
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Jose Gonzalez, Antonio Gonzalez
Issue Date:October 1999
pp. 57
In this paper, we introduce a new branch predictor that predicts the outcomes of branches by predicting the value of their inputs and performing an early computation of their results according to the predicted values. The design of a hybrid predictor compr...
 
A Cost-Effective Clustered Architecture
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Ramon Canal, Joan-Manuel Parcerisa, Antonio Gonzalez
Issue Date:October 1999
pp. 160
In current superscalar processors, all floating-point resources are idle during the execution of integer programs. As previous works show, this problem can be alleviated if the floating-point cluster is extended to execute simple integer instructions. With...
 
Trace-Level Reuse
Found in: Parallel Processing, International Conference on
By Antonio Gonzalez, Jordi Tubella, Carlos Molina
Issue Date:September 1999
pp. 30
Trace-level reuse is based on the observation that some traces (dynamic sequences of instructions) are frequently repeated during the execution of a program, and in many cases, the instructions that make up such traces have the same source operand values. ...
 
The Synergy of Multithreading and Access/Execute Decoupling
Found in: High-Performance Computer Architecture, International Symposium on
By Joan-Manuel Parcerisa, Antonio González
Issue Date:January 1999
pp. 59
This work presents and evaluates a novel processor microarchitecture which combines two paradigms: access/execute decoupling and simultaneous multithreading. We investigate how both techniques complement each other: while decoupling features an excellent m...
 
Fast, Accurate and Flexible Data Locality Analysis
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Jesus Sánchez, Antonio González
Issue Date:October 1998
pp. 124
This paper presents a tool based on a new approach for analyzing the locality exhibited by data memory references. The tool is very fast because it is based on a static locality analysis enhanced with very simple profiling information, which results in a n...
 
The Latency Hiding Effectiveness of Decoupled Access/Execute Processors
Found in: EUROMICRO Conference
By Joan-Manuel Parcerisa, Antonio González
Issue Date:August 1998
pp. 10293
<p>Several studies have demonstrated that out-of-order execution processors may not be the most adequate organization for wide issue processors due to the increasing penalties that wire delays will cause in the issue logic. The main target of out-of-...
 
Software Prefetching for Software Pipelined Loops
Found in: Hawaii International Conference on System Sciences
By F. Jesus Sanchez, Antonio Gonzalez
Issue Date:January 1998
pp. 778
This paper investigates the interaction between software pipelining and different software prefetching techniques for VLIW machines. It is shown that processor stalls due to memory dependences have a great impact into execution time. A novel heuristic is p...
   
Cache Sensitive Modulo Scheduling
Found in: Microarchitecture, IEEE/ACM International Symposium on
By F. Jesus Sanchez, Antonio Gonzalez
Issue Date:December 1997
pp. 338
This paper focuses on the interaction between software prefetching (both binding and nonbinding) and software pipelining for VLIW machines. First, it is shown that evaluating software pipelined schedules without considering memory effects can be rather ina...
 
Executing Algorithms with Hypercube Topology on Torus Multicomputers
Found in: IEEE Transactions on Parallel and Distributed Systems
By Antonio González, Miguel Valero-García, Luis Díaz de Cerio
Issue Date:August 1995
pp. 803-814
<p><it>Abstract</it>—Many parallel algorithms use hypercubes as the communication topology among their processes. When such algorithms are executed on hypercube multicomputers the communication cost is kept minimum since processes can be ...
 
Dynamic Selective Devectorization for Efficient Power Gating of SIMD Units in a HW/SW Co-Designed Environment
Found in: 2013 25th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
By Rakesh Kumar,Alejandro Martinez,Antonio Gonzalez
Issue Date:October 2013
pp. 81-88
Leakage power is a growing concern in current and future microprocessors. Functional units of microprocessors are responsible for a major fraction of this power. Therefore, reducing functional unit leakage has received much attention in the recent years. P...
 
Exploiting temporal locality in network traffic using commodity multi-cores
Found in: Performance Analysis of Systems and Software, IEEE International Symmposium on
By Govind Sreekar Shenoy,Jordi Tubella,Antonio Gonzalez
Issue Date:April 2012
pp. 110-111
Network traffic has traditionally exhibited temporal locality in the header field of packets. Such locality is intuitive and is very well studied over the years. In this work we study temporal locality in the packet payload. Temporal locality can also be v...
 
 1  2 Next >>