Search For:

Displaying 1-19 out of 19 total
Hybrid Dataflow/von-Neumann Architectures
Found in: IEEE Transactions on Parallel and Distributed Systems
By Fahimeh Yazdanpanah,Carlos Alvarez-Martinez,Daniel Jimenez-Gonzalez,Yoav Etsion
Issue Date:June 2014
pp. 1489-1509
General purpose hybrid dataflow/von-Neumann architectures are gaining attraction as effective parallel platforms. Although different implementations differ in the way they merge the conceptually different computational models, they all follow similar princ...
Memristor-Based Multithreading
Found in: IEEE Computer Architecture Letters
By Shahar Kvatinsky,Yuval H. Nacson,Yoav Etsion,Eby G. Friedman,Avinoam Kolodny,Uri C. Weiser
Issue Date:January 2014
pp. 1-1
Switch on Event Multithreading (SoE MT, also known as coarse-grained MT and block MT) processors run multiple threads on a pipeline machine, while the pipeline switches threads on stall events (e.g., cache miss). The thread switch penalty is determined by ...
Exploiting Core Working Sets to Filter the L1 Cache with Random Sampling
Found in: IEEE Transactions on Computers
By Yoav Etsion,Dror G. Feitelson
Issue Date:November 2012
pp. 1535-1550
Locality is often characterized by working sets, defined by Denning as the set of distinct addresses referenced within a certain window of time. This definition ignores the fact that dramatic differences exist between the usage patterns of frequently used ...
On the memory system requirements of future scientific applications: Four case-studies
Found in: IEEE Workload Characterization Symposium
By Milan Pavlovic,Yoav Etsion,Alex Ramirez
Issue Date:November 2011
pp. 159-170
In this paper, we observe and characterize the memory behaviour, and specifically memory footprint, memory bandwidth and cache effectiveness, of several well-known parallel scientific applications running on a large processor cluster. Based on the analysis...
DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Carlos Villavieja,Vasileios Karakostas,Lluis Vilanova,Yoav Etsion,Alex Ramirez,Avi Mendelson,Nacho Navarro,Adrian Cristal,Osman S. Unsal
Issue Date:October 2011
pp. 340-349
Translation Look aside Buffers (TLBs) are ubiquitously used in modern architectures to cache virtual-to-physical mappings and, as they are looked up on every memory access, are paramount to performance scalability. The emergence of chip-multiprocessors (CM...
Trace-driven simulation of multithreaded applications
Found in: Performance Analysis of Systems and Software, IEEE International Symmposium on
By Alejandro Rico, Alejandro Duran, Felipe Cabarcas, Yoav Etsion, Alex Ramirez, Mateo Valero
Issue Date:April 2011
pp. 87-96
Over the past few years, computer architecture research has moved towards execution-driven simulation, due to the inability of traces to capture timing-dependent thread execution interleaving. However, trace-driven simulation has many advantages over execu...
Task Superscalar: An Out-of-Order Task Pipeline
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Yoav Etsion, Felipe Cabarcas, Alejandro Rico, Alex Ramirez, Rosa M. Badia, Eduard Ayguade, Jesus Labarta, Mateo Valero
Issue Date:December 2010
pp. 89-100
We present \emph{Task Super scalar}, an abstraction of instruction-level out-of-order pipeline that operates at the task-level. Like ILP pipelines, which uncover parallelism in a sequential instruction stream, task super scalar uncovers task-level parallel...
A global scheduling framework for virtualization environments
Found in: Parallel and Distributed Processing Symposium, International
By Yoav Etsion,Tal Ben-Nun,Dror G. Feitelson
Issue Date:May 2009
pp. 1-8
A premier goal of resource allocators in virtualization environments is to control the relative resource consumption of the different virtual machines, and moreover, to be able to change the relative allocations at will. However, it is not clear what it me...
L1 Cache Filtering Through Random Selection of Memory References
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Yoav Etsion, Dror G. Feitelson
Issue Date:September 2007
pp. 235-244
Distinguishing transient blocks from frequently used blocks enables servicing references to transient blocks from a small fully-associative auxiliary cache structure. By inserting only frequently used blocks into the main cache structure, we can reduce the...
Backfilling Using System-Generated Predictions Rather than User Runtime Estimates
Found in: IEEE Transactions on Parallel and Distributed Systems
By Dan Tsafrir, Yoav Etsion, Dror G. Feitelson
Issue Date:June 2007
pp. 789-803
<p><b>Abstract</b>—The most commonly used scheduling algorithm for parallel supercomputers is FCFS with backfilling, as originally introduced in the EASY scheduler. Backfilling means that short jobs are allowed to run ahead of their time ...
Probabilistic Prediction of Temporal Locality
Found in: IEEE Computer Architecture Letters
By Yoav Etsion, Dror G. Feitelson
Issue Date:January 2007
pp. 17-20
The increasing gap between processor and memory speeds, as well as the introduction of multi-core CPUs, have exacerbated the dependency of CPU performance on the memory subsystem. This trend motivates the search for more efficient caching mechanisms, enabl...
User-Level Communication in a System with Gang Scheduling
Found in: Parallel and Distributed Processing Symposium, International
By Yoav Etsion, Dror G. Feitelson
Issue Date:April 2001
pp. 10058a
One of the scarce resources that limits communication performance is buffer space on the network interface card. This becomes even worse when it is partitioned among several time-sliced processes. However, if gang scheduling is used, it is possible to swap...
CODOMs: Protecting software with Code-centric memory Domains
Found in: 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)
By Lluis Vilanova,Muli Ben-Yehuda,Nacho Navarro,Yoav Etsion,Mateo Valero
Issue Date:June 2014
pp. 469-480
Today's complex software systems are neither secure nor reliable. The rudimentary software protection primitives provided by current hardware forces systems to run many distrusting software components (e.g., procedures, libraries, plugins, modules) in the ...
Single-graph multiple flows: Energy efficient design alternative for GPGPUs
Found in: 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)
By Dani Voitsechov,Yoav Etsion
Issue Date:June 2014
pp. 205-216
We present the single-graph multiple-flows (SGMF) architecture that combines coarse-grain reconfigurable computing with dynamic dataflow to deliver massive thread-level parallelism. The CUDA-compatible SGMF architecture is positioned as an energy efficient...
On the simulation of large-scale architectures using multiple application abstraction levels
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Augusto Vega, Carlos Villavieja, Mateo Valero, Milan Pavlovic, Yoav Etsion, Alejandro Rico, Alex Ramirez, Felipe Cabarcas
Issue Date:January 2012
pp. 1-20
Simulation is a key tool for computer architecture research. In particular, cycle-accurate simulators are extremely important for microarchitecture exploration and detailed design decisions, but they are slow and, so, not suitable for simulating large-scal...
Implementation of a hierarchical N-body simulator using the Ompss programming model
Found in: Proceedings of the first workshop on Irregular applications: architectures and algorithm (IAAA '11)
By Miquel Pericas, Xavier Martorell, Yoav Etsion
Issue Date:November 2011
pp. 23-30
Many HPC algorithms are highly irregular. They have input-dependent control flow and operate on pointer-based data structures such as trees, graphs, or linked lists. This irregularity makes it challenging to parallelize such algorithms in order to efficien...
Process prioritization using output production: Scheduling for multimedia
Found in: ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
By Dan Tsafrir, Dror G. Feitelson, Yoav Etsion
Issue Date:November 2006
pp. 318-342
Desktop operating systems such as Windows and Linux base scheduling decisions on CPU consumption; processes that consume fewer CPU cycles are prioritized, assuming that interactive processes gain from this since they spend most of their time waiting for us...
Desktop scheduling: how can we know what the user wants?
Found in: Proceedings of the 14th international workshop on Network and operating systems support for digital audio and video (NOSSDAV '04)
By Dan Tsafrir, Dror G. Feitelson, Yoav Etsion
Issue Date:June 2004
pp. 110-115
Current desktop operating systems use CPU utilization (or lack thereof) to prioritize processes for scheduling. This was thought to be beneficial for interactive processes, under the assumption that they spend much of their time waiting for user input. Thi...
Effects of clock resolution on the scheduling of interactive and soft real-time processes
Found in: Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems (SIGMETRICS '03)
By Dan Tsafrir, Dror G. Feitelson, Yoav Etsion
Issue Date:June 2003
pp. 172-183
It is commonly agreed that scheduling mechanisms in general purpose operating systems do not provide adequate support for modern interactive applications, notably multimedia applications. The common solution to this problem is to devise specialized schedul...