Search For:

Displaying 1-36 out of 36 total
EU FP7-288307 Pharaon Project: Parallel and Heterogeneous Architecture for Real-Time Applications
Found in: 2013 Euromicro Conference on Digital System Design (DSD)
By Hector Posadas,Eugenio Villar,Florian Broekaert,Michel Bourdelles,Albert Cohen,Antoniu Pop,Nhat Minh Le,Adrien Guatto,Mihai T. Lazarescu,Luciano Lavagno,Andrei Terechko,Miguel Glassee,Daniel Calvo,Edouardo de las Heras
Issue Date:September 2013
pp. 371-378
In this article, we present the work-in-progress of the EU FP7 PHARAON project, started in September 2011. The first objective of the project is the development of new techniques and tools capable to assist the designer in the development of parallel embed...
 
Automatic Extraction of Coarse-Grained Data-Flow Threads from Imperative Programs
Found in: IEEE Micro
By Feng Li,Antoniu Pop,Albert Cohen
Issue Date:July 2012
pp. 19-31
This article presents a general algorithm for transforming sequential imperative programs into parallel data-flow programs. The algorithm operates on a program dependence graph in static-single-assignment form, extracting task, pipeline, and data paralleli...
 
Correct and Efficient Bounded FIFO Queues
Found in: 2013 25th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
By Nhat Minh Le,Adrien Guatto,Albert Cohen,Antoniu Pop
Issue Date:October 2013
pp. 144-151
Bounded single-producer single-consumer FIFO queues are one of the simplest concurrent data-structure, and they do not require more than sequential consistency for correct operation. Still, sequential consistency is an unrealistic hypothesis on shared-memo...
 
The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices
Found in: 2013 Euromicro Conference on Digital System Design (DSD)
By Marco Solinas,Rosa M. Badia,Francois Bodin,Albert Cohen,Paraskevas Evripidou,Paolo Faraboschi,Bernhard Fechner,Guang R. Gao,Arne Garbade,Sylvain Girbal,Daniel Goodman,Behran Khan,Souad Koliai,Feng Li,Mikel Lujan,Laurent Morin,Avi Mendelson,Nacho Navarro,Antoniu Pop,Pedro Trancoso,Theo Ungerer,Mateo Valero,Sebastian Weis,Ian Watson,Stephane Zuckermann,Roberto Giorgi
Issue Date:September 2013
pp. 272-279
Thanks to the improvements in semiconductor technologies, extreme-scale systems such as teradevices (i.e., composed by 1000 billion of transistors) will enable systems with 1000+ general purpose cores per chip, probably by 2020. Three major challenges have...
 
A polynomial spilling heuristic: Layered allocation
Found in: 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
By Boubacar Diouf,Albert Cohen,Fabrice Rastello
Issue Date:February 2013
pp. 1-10
Register allocation is one of the most important, and one of the oldest compiler optimizations. It aims to map temporary variables to machine registers, and defaults to explicit load/store from memory when necessary. The latter option is referred to as spi...
 
Vapor SIMD: Auto-vectorize once, run everywhere
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Dorit Nuzman, Sergei Dyshel, Erven Rohou, Ira Rosen, Kevin Williams, David Yuste, Albert Cohen, Ayal Zaks
Issue Date:April 2011
pp. 151-160
Just-in-Time (JIT) compiler technology offers portability while facilitating target- and context-specific specialization. Single-Instruction-Multiple-Data (SIMD) hardware is ubiquitous and markedly diverse, but can be difficult for JIT compilers to efficie...
 
Predictive modeling in a polyhedral optimization space
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Eunjung Park, Louis-Noel Pouche, John Cavazos, Albert Cohen, P. Sadayappan
Issue Date:April 2011
pp. 119-129
Significant advances in compiler optimization have been made in recent years, enabling many transformations such as tiling, fusion, parallelization and vectorization on imperfectly nested loops. Nevertheless, the problem of finding the best combination of ...
 
Combined iterative and model-driven optimization in an automatic parallelization framework
Found in: 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
By Louis-Noel Pouchet,Uday Bondhugula,Cedric Bastoul,Albert Cohen,J Ramanujam,P Sadayappan
Issue Date:November 2010
pp. 1-11
Today's multi-core era places significant demands on an optimizing compiler, which must parallelize programs, exploit memory hierarchy, and leverage the ever-increasing SIMD capabilities of modern processors. Existing model-based heuristics for performance...
 
Polyhedral-Model Guided Loop-Nest Auto-Vectorization
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Konrad Trifunovic, Dorit Nuzman, Albert Cohen, Ayal Zaks, Ira Rosen
Issue Date:September 2009
pp. 327-337
Optimizing compilers strive to construct efficient executables by applying sequences of transformations. Additional transformations are constantly being devised, with various mutual interactions among them, thereby exacerbating the notoriously difficult ph...
 
Coarse-Grained Loop Parallelization: Iteration Space Slicing vs Affine Transformations
Found in: Parallel and Distributed Computing, International Symposium on
By Anna Beletska, Wlodzimierz Bielecki, Albert Cohen, Marek Palkowski, Krzysztof Siedlecki
Issue Date:July 2009
pp. 73-80
Automatic coarse-grained parallelization of program loops is of great importance for multi-core computing systems. This paper presents a comparison of Iteration SpaceSlicing and Affine Transformation Framework algorithms aimed at extracting coarse-grained ...
 
Automatic Correction of Loop Transformations
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Nicolas Vasilache, Albert Cohen, Louis-Noel Pouchet
Issue Date:September 2007
pp. 292-304
Loop nest optimization is a combinatorial problem. Due to the growing complexity of modern architectures, it involves two increasingly difficult tasks: (1) analyzing the profitability of sequences of transformations to enhance parallelism, locality, and re...
 
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Louis-Noel Pouchet, Cedric Bastoul, Albert Cohen, Nicolas Vasilache
Issue Date:March 2007
pp. 144-156
<p>Emerging microprocessors offer unprecedented parallel computing capabilities and deeper memory hierarchies, increasing the importance of loop transformations in optimizing compilers. Because compiler heuristics rely on simplistic performance model...
 
Deep Jam: Conversion of Coarse-Grain Parallelism to Instruction-Level and Vector Parallelism for Irregular Applications
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Patrick Carribault, Albert Cohen, William Jalby
Issue Date:September 2005
pp. 291-302
<p>A number of compute-intensive applications suffer from performance loss due to the lack of instruction-level parallelism in sequences of dependent instructions. This is particularly accurate on wide-issue architectures with large register banks, w...
 
Towards a Systematic, Pragmatic and Architecture-Aware Program Optimization Process for Complex Processors
Found in: SC Conference
By David Parello, Olivier Temam, Albert Cohen, Jean-Marie Verdun
Issue Date:November 2004
pp. 15
Because processor architectures are increasingly complex, it is increasingly difficult to embed accurate machine models within compilers. As a result, compiler efficiency tends to decrease. Currently, the trend is on top-down approaches: static compilers a...
 
Instance-Wise Reaching Definition Analysis for Recursive Programs using Context-Free Transductions
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Albert Cohen, Jean-Fran├žois Collard
Issue Date:October 1998
pp. 332
Automatic parallelization of recursive programs is still an open problem today, lacking suitable and precise static analyses. We present a novel reaching definition framework based on context-free transductions. The technique achieves a global and precise ...
 
The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices
Found in: 2013 Euromicro Conference on Digital System Design (DSD)
By Marco Solinas,Rosa M. Badia,Francois Bodin,Albert Cohen,Paraskevas Evripidou,Paolo Faraboschi,Bernhard Fechner,Guang R. Gao,Arne Garbade,Sylvain Girbal,Daniel Goodman,Behran Khan,Souad Koliai,Feng Li,Mikel Lujan,Laurent Morin,Avi Mendelson,Nacho Navarro,Antoniu Pop,Pedro Trancoso,Theo Ungerer,Mateo Valero,Sebastian Weis,Ian Watson,Stephane Zuckermann,Roberto Giorgi
Issue Date:September 2013
pp. 272-279
Thanks to the improvements in semiconductor technologies, extreme-scale systems such as teradevices (i.e., composed by 1000 billion of transistors) will enable systems with 1000+ general purpose cores per chip, probably by 2020. Three major challenges have...
 
EU FP7-288307 Pharaon Project: Parallel and Heterogeneous Architecture for Real-Time Applications
Found in: 2013 Euromicro Conference on Digital System Design (DSD)
By Hector Posadas,Eugenio Villar,Florian Broekaert,Michel Bourdelles,Albert Cohen,Antoniu Pop,Nhat Minh Le,Adrien Guatto,Mihai T. Lazarescu,Luciano Lavagno,Andrei Terechko,Miguel Glassee,Daniel Calvo,Edouardo de las Heras
Issue Date:September 2013
pp. 371-378
In this article, we present the work-in-progress of the EU FP7 PHARAON project, started in September 2011. The first objective of the project is the development of new techniques and tools capable to assist the designer in the development of parallel embed...
 
Hybrid Hexagonal/Classical Tiling for GPUs
Found in: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '14)
By Albert Cohen, Justin Holewinski, P. Sadayappan, Sven Verdoolaege, Tobias Grosser
Issue Date:February 2014
pp. 66-75
Time-tiling is necessary for the efficient execution of iterative stencil computations. Classical hyper-rectangular tiles cannot be used due to the combination of backward and forward dependences along space dimensions. Existing techniques trade temporal d...
     
Code generation for an application-specific VLIW processor with clustered, addressable register files
Found in: Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems (ODES '13)
By Albert Cohen, Christian Bernard, Christian Fabre, Henri-Pierre Charles, Ivan Llopard, Jérôme Martin
Issue Date:February 2013
pp. 11-19
Modern compilers integrate recent advances in compiler construction, intermediate representations, algorithms and programming language front-ends. Yet code generation for application-specific architectures benefits only marginally from this trend, as most ...
     
Parallelizing and optimizing compilation for synchronous languages: new directions for high-performance embedded systems
Found in: Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems (ODES '13)
By Albert Cohen
Issue Date:February 2013
pp. 1-1
Synchronous data-flow programming appeared in the 80s, with Lustre and Signal, applied to the design, modeling, and programming of safety-critical embedded systems. Its scientific and industrial success derives from the ability to master the high levels of...
     
Correct and efficient work-stealing for weak memory models
Found in: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP '13)
By Albert Cohen, Antoniu Pop, Francesco Zappa Nardelli, Nhat Minh Lê
Issue Date:February 2013
pp. 69-80
Chase and Lev's concurrent deque is a key data structure in shared-memory parallel programming and plays an essential role in work-stealing schedulers. We provide the first correctness proof of an optimized implementation of Chase and Lev's deque on top of...
     
Sub-polyhedral scheduling using (unit-)two-variable-per-inequality polyhedra
Found in: Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages (POPL '13)
By Albert Cohen, Ramakrishna Upadrasta
Issue Date:January 2013
pp. 483-496
Polyhedral compilation has been successful in the design and implementation of complex loop nest optimizers and parallelizing compilers. The algorithmic complexity and scalability limitations remain one important weakness. We address it using sub-polyhedra...
     
Polyhedral parallel code generation for CUDA
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Albert Cohen, Christian Tenllado, Francky Catthoor, José Ignacio Gómez, Juan Carlos Juega, Sven Verdoolaege
Issue Date:January 2013
pp. 1-23
This article addresses the compilation of a sequential program for parallel execution on a modern GPU. To this end, we present a novel source-to-source compiler called PPCG. PPCG singles out for its ability to accelerate computations from any static contro...
     
OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Albert Cohen, Antoniu Pop
Issue Date:January 2013
pp. 1-25
We present OpenStream, a data-flow extension of OpenMP to express dynamic dependent tasks. The language supports nested task creation, modular composition, variable and unbounded sets of producers/consumers, and first-class streams. These features, enabled...
     
Improved loop tiling based on the removal of spurious false dependences
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Albert Cohen, Konrad Trifunović, Riyadh Baghdadi, Sven Verdoolaege
Issue Date:January 2013
pp. 1-26
To preserve the validity of loop nest transformations and parallelization, data dependences need to be analyzed. Memory dependences come in two varieties: true dependences or false dependences. While true dependences must be satisfied in order to preserve ...
     
A decoupled local memory allocator
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Özcan Özturk, Albert Cohen, Boubacar Diouf, Can Hantaş, Jens Palsberg
Issue Date:January 2013
pp. 1-22
Compilers use software-controlled local memories to provide fast, predictable, and power-efficient access to critical data. We show that the local memory allocation for straight-line, or linearized programs is equivalent to a weighted interval-graph colori...
     
Practical aggregation of semantical program properties for machine learning based optimization
Found in: Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems (CASES '10)
By Albert Cohen, Ari Freund, Ayal Zaks, Grigori Fursin, Mircea Namolaru
Issue Date:October 2010
pp. 197-206
Iterative search combined with machine learning is a promising approach to design optimizing compilers harnessing the complexity of modern computing systems. While traversing a program optimization space, we collect characteristic feature vectors of the pr...
     
Erbium: a deterministic, concurrent intermediate representation to map data-flow tasks to scalable, persistent streaming processes
Found in: Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems (CASES '10)
By Albert Cohen, Antoniu Pop, Cupertino Miranda, Marc Duranton, Philippe Dumont
Issue Date:October 2010
pp. 11-20
Tuning applications for multicore systems involve subtle concurrency concepts and target-dependent optimizations. This paper advocates for a streaming execution model, called ER, where persistent processes communicate and synchronize through a multi-consum...
     
Processor virtualization and split compilation for heterogeneous multicore embedded systems
Found in: Proceedings of the 47th Design Automation Conference (DAC '10)
By Albert Cohen, Erven Rohou
Issue Date:June 2010
pp. 102-107
Embedded multiprocessors have always been heterogeneous, driven by the power-efficiency and compute-density of hardware specialization. We aim to achieve portability and sustained performance of complete applications, leveraging diverse programmable cores....
     
Post-pass periodic register allocation to minimise loop unrolling degree
Found in: Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems (LCTES '08)
By Albert Cohen, Mounira Bachir, Sid-Ahmed-Ali Touati
Issue Date:June 2008
pp. 1-16
This paper solves an open problem regarding loop unrolling after periodic register allocation. Although software pipelining is a powerful technique to extract fine-grain parallelism, it generates reuse circuits spanning multiple loop iterations. These circ...
     
Code-size conscious pipelining of imperfectly nested loops
Found in: Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture (MEDEA '07)
By Albert Cohen
Issue Date:September 2007
pp. 49-55
This paper is a step towards enabling multidimensional software pipelining of non-perfectly nested loops on memory-constrained architectures. We propose a method to pipeline multiple inner loops without increasing the size of the loop nest, apart from an o...
     
An architecture for distributed wavelet analysis and processing in sensor networks
Found in: Proceedings of the fifth international conference on Information processing in sensor networks (IPSN '06)
By Albert Cohen, David B. Johnson, Raymond S. Wagner, Richard G. Baraniuk, Shu Du
Issue Date:April 2006
pp. 243-250
Distributed wavelet processing within sensor networks holds promise for reducing communication energy and wireless bandwidth usage at sensor nodes. Local collaboration among nodes de-correlates measurements, yielding a sparser data set with significant val...
     
N-synchronous Kahn networks: a relaxed model of synchrony for real-time systems
Found in: Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages (POPL'06)
By Albert Cohen, Christine Eisenbeis, Claire Pagetti, Florence Plateau, Marc Duranton, Marc Pouzet
Issue Date:January 2006
pp. 180-193
The design of high-performance stream-processing systems is a fast growing domain, driven by markets such like high-end TV, gaming, 3D animation and medical imaging. It is also a surprisingly demanding task, with respect to the algorithmic and conceptual s...
     
Applications of storage mapping optimization to register promotion
Found in: Proceedings of the 18th annual international conference on Supercomputing (ICS '04)
By Albert Cohen, Patrick Carribault
Issue Date:June 2004
pp. 247-256
Storage mapping optimization is a flexible approach to folding array dimensions in numerical codes. It is designed to reduce the memory footprint after a wide spectrum of loop transformations, whether based on uniform dependence vectors or more expressive ...
     
DiST: a simple, reliable and scalable method to significantly reduce processor architecture simulation time
Found in: Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems (SIGMETRICS '03)
By Albert Cohen, Gilles Mouchard, Olivier Temam, Sylvain Girbal
Issue Date:June 2003
pp. 1-12
While architecture simulation is often treated as a methodology issue, it is at the core of most processor architecture research works, and simulation speed is often the bottleneck of the typical trial-and-error research process. To speedup simulation duri...
     
Monotonic evolution: an alternative to induction variable substitution for dependence analysis
Found in: Proceedings of the 15th international conference on Supercomputing (ICS '01)
By Albert Cohen, David Padua, Jay Hoeflinger, Peng Wu
Issue Date:June 2001
pp. 78-91
We present a new approach to dependence testing in the presence of induction variables. Instead of looking for closed form expressions, our method computes monotonic evolution which captures the direction in which the value of a variable changes. This info...
     
 1