Search For:

Displaying 1-7 out of 7 total
Block Unification IF-conversion for High Performance Architectures
Found in: IEEE Computer Architecture Letters
By Nadav Rotem,Yosi Ben Asher
Issue Date:January 2014
pp. 1-1
Graphics Processing Units accelerate data-parallel graphic calculations using wide SIMD vector units. Compiling programs to use the GPU's SIMD architectures require converting multiple control flow paths into a single stream of instructions. IF-conver...
Optimizing Wait States in the Synthesis of Memory References with Unpredictable Latencies
Found in: ACM Transactions on Reconfigurable Technology and Systems (TRETS)
By Nadav Rotem, Ron Meldiner, Yosi Ben-Asher
Issue Date:December 2013
pp. 1-9
We consider the problem of synthesizing circuits (from C to Verilog) that are optimized to handle unpredictable latencies of memory operations. Unpredictable memory latencies can occur due to the use of on chip caches, DRAM memory modules, buffers/queues, ...
The benefits of using variable-length pipelined operations in high-level synthesis
Found in: ACM Transactions on Embedded Computing Systems (TECS)
By Yosi Ben-Asher, Nadav Rotem
Issue Date:December 2013
pp. 1-23
Current high-level synthesis systems synthesize arithmetic units of a fixed known number of stages, and the scheduler mainly determines when units are activated. We focus on scheduling techniques for the high-level synthesis of pipelined arithmetic units w...
Hybrid type legalization for a sparse SIMD instruction set
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Nadav Rotem, Yosi Ben Asher
Issue Date:September 2013
pp. 1-14
SIMD vector units implement only a subset of the operations used by vectorizing compilers, and there are multiple conflicting techniques to legalize arbitrary vector types into register-sized data types. Traditionally, type legalization is performed using ...
Automatic memory partitioning: increasing memory parallelism via data structure partitioning
Found in: Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis (CODES/ISSS '10)
By Nadav Rotem, Yosi Ben-Asher
Issue Date:October 2010
pp. 155-162
In high-level synthesis, pipelined designs are often restricted by the number of memory banks available to the synthesis system. Using multiple memory banks can improve the performance of accelerated applications. Currently, programmers must manually assig...
Reducing Memory Constraints in Modulo Scheduling Synthesis for FPGAs
Found in: ACM Transactions on Reconfigurable Technology and Systems (TRETS)
By Danny Meisler, Nadav Rotem, Yosi Ben-Asher
Issue Date:September 2010
pp. 1-19
In High-Level Synthesis (HLS), extracting parallelism in order to create small and fast circuits is the main advantage of HLS over software execution. Modulo Scheduling (MS) is a technique in which a loop is parallelized by overlapping different parts of s...
The effect of unrolling and inlining for Python bytecode optimizations
Found in: Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference (SYSTOR '09)
By Nadav Rotem, Yosi Ben Asher
Issue Date:May 2009
pp. 1-42
In this study, we consider bytecode optimizations for Python, a programming language which combines object-oriented concepts with features of scripting languages, such as dynamic dictionaries. Due to its design nature, Python is relatively slow compared to...