Search For:

Displaying 1-44 out of 44 total
A Very Fast Simulator for Exploring the Many-Core Future
Found in: Parallel and Distributed Processing Symposium, International
By Olivier Certner,Zheng Li,Arun Raman,Olivier Temam
Issue Date:May 2011
pp. 443-454
Although multi-core architectures with a large number of cores (
 
A Practical Approach for Reconciling High and Predictable Performance in Non-Regular Parallel Programs
Found in: Design, Automation and Test in Europe Conference and Exhibition
By Olivier Certner, Zheng Li, Pierre Palatin, Olivier Temam, Frederic Arzel, Nathalie Drach
Issue Date:March 2008
pp. 740-745
Increasingly complex consumer electronics applications call for embedded processors with higher performance. Multi-cores are capable of delivering the required performance. However, many of these embedded applications must meet some form of soft real-time ...
 
ArchExplorer for Automatic Design Space Exploration
Found in: IEEE Micro
By Veerle Desmet, Sylvain Girbal, Alex Ramirez, Augusto Vega, Olivier Temam
Issue Date:September 2010
pp. 5-15
<p>Growing architectural complexity and stringent time-to-market constraints suggest the need to move architecture design beyond parametric exploration to structural exploration. ArchExplorer is a Web-based permanent and open design-space exploration...
 
A Sampling Method Focusing on Practicality
Found in: IEEE Micro
By Daniel Gracia Pérez, Hugues Berry, Olivier Temam
Issue Date:November 2006
pp. 14-28
This sampling technique, which is hardware-independent and almost entirely transparent to the user, employs a budget-based approach that jointly considers warm-up and sampling costs, presents them as a single parameter to the user, and distributes simulate...
 
Cluster Cache Monitor
Found in: 2013 25th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
By Guohong Li,Olivier Temam,Zhenyu Liu,Dongsheng Wang,Sanchuan Guo,Chongmin Li
Issue Date:October 2013
pp. 1-8
As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies bec...
 
BenchNN: On the broad potential application scope of hardware neural network accelerators
Found in: 2012 IEEE International Symposium on Workload Characterization (IISWC)
By Tianshi Chen,Yunji Chen,Marc Duranton,Qi Guo,Atif Hashmi,Mikko Lipasti,Andrew Nere,Shi Qiu,Michele Sebag,Olivier Temam
Issue Date:November 2012
pp. 36-45
Recent technology trends have indicated that, although device sizes will continue to scale as they have in the past, supply voltage scaling has ended. As a result, future chips can no longer rely on simply increasing the operational core count to improve p...
 
SWAP: Parallelization through Algorithm Substitution
Found in: IEEE Micro
By Hengjie Li,Wenting He,Yang Chen,Lieven Eeckhout,Olivier Temam,Chengyong Wu
Issue Date:July 2012
pp. 54-67
By explicitly indicating which algorithms they use and encapsulating these algorithms within software components, programmers make it possible for an algorithm-aware compiler to replace their original algorithm implementations with compatible parallel impl...
 
Statistical performance comparisons of computers
Found in: High-Performance Computer Architecture, International Symposium on
By Tianshi Chen,Yunji Chen,Qi Guo,Olivier Temam,Yue Wu,Weiwu Hu
Issue Date:February 2012
pp. 1-12
As a fundamental task in computer architecture research, performance comparison has been continuously hampered by the variability of computer performance. In traditional performance comparisons, the impact of performance variability is usually ignored (i.e...
 
How sensitive is processor customization to the workload's input datasets?
Found in: Application Specific Processors, Symposium on
By Maximilien Breughe,Zheng Li,Yang Chen,Stijn Eyerman,Olivier Temam,Chengyong Wu,Lieven Eeckhout
Issue Date:June 2011
pp. 1-7
Hardware customization is an effective approach for meeting application performance requirements while achieving high levels of energy efficiency. Application-specific processors achieve high performance at low energy by tailoring their designs towards a s...
 
CMA: Chip multi-accelerator
Found in: Application Specific Processors, Symposium on
By Dominik Auras, Sylvain Girbal, Hugues Berry, Olivier Temam, Sami Yehia
Issue Date:June 2010
pp. 8-15
Custom acceleration has been a standard choice in embedded systems thanks to the power density and performance efficiency it provides. Parallelism is another orthogonal scalability path that efficiently overcomes the increasing limitation of frequency scal...
 
UNISIM: An Open Simulation Environment and Library for Complex Architecture Design and Collaborative Development
Found in: IEEE Computer Architecture Letters
By David August, Jonathan Chang, Sylvain Girbal, Daniel Gracia-Perez, Gilles Mouchard, David A. Penry, Olivier Temam, Neil Vachharajani
Issue Date:July 2007
pp. 45-48
Simulator development is already a huge burden for many academic and industry research groups; future complex or heterogeneous multi-cores, as well as the multiplicity of performance metrics and required functionality, will make matters worse. We present a...
 
Rapidly Selecting Good Compiler Optimizations using Performance Counters
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By John Cavazos, Grigori Fursin, Felix Agakov, Edwin Bonilla, Michael F.P. O'Boyle, Olivier Temam
Issue Date:March 2007
pp. 185-197
Applying the right compiler optimizations to a particular program can have a significant impact on program performance. Due to the non-linear interaction of compiler optimizations, however, determining the best setting is nontrivial. There have been severa...
 
CAPSULE: Hardware-Assisted Parallel Execution of Component-Based Programs
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Pierre Palatin, Yves Lhuillier, Olivier Temam
Issue Date:December 2006
pp. 247-258
<p>Since processor performance scalability will now mostly be achieved through thread-level parallelism, there is a strong incen- tive to parallelize a broad range of applications, including those with complex control flow and data structures. And wr...
 
MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms
Found in: Microarchitecture, IEEE/ACM International Symposium on
By Daniel Gracia Pérez, Gilles Mouchard, Olivier Temam
Issue Date:December 2004
pp. 43-54
While most research papers on computer architectures include some performance measurements, these performance numbers tend to be distrusted. Up to the point that, after so many research articles on data cache architectures, for instance, few researchers ha...
 
Towards a Systematic, Pragmatic and Architecture-Aware Program Optimization Process for Complex Processors
Found in: SC Conference
By David Parello, Olivier Temam, Albert Cohen, Jean-Marie Verdun
Issue Date:November 2004
pp. 15
Because processor architectures are increasingly complex, it is increasingly difficult to embed accurate machine models within compilers. As a result, compiler efficiency tends to decrease. Currently, the trend is on top-down approaches: static compilers a...
 
From Sequences of Dependent Instructions to Functions: An Approach for Improving Performance without ILP or Speculation
Found in: Computer Architecture, International Symposium on
By Sami Yehia, Olivier Temam
Issue Date:June 2004
pp. 238
In this article, we present an approach for improving the performance of sequences of dependent instructions. We observe that many sequences of instructions can be interpreted as functions. Unlike sequences of instructions, functions can be translated into...
 
VHC: Quickly Building an Optimizer for Complex Embedded Architectures
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Michael Dupré, Nathalie Drach, Olivier Temam
Issue Date:March 2004
pp. 53
To meet the high demand for powerful embedded processors, VLIW architectures are increasingly complex (e.g., multiple clusters), and moreover, they now run increasingly sophisticated control-intensive applications. As a result, developing architecture-spec...
 
A New Optimized Implemention of the SystemC Engine Using Acyclic Scheduling
Found in: Design, Automation and Test in Europe Conference and Exhibition
By Daniel Gracia Pérez, Gilles Mouchard, Olivier Temam
Issue Date:February 2004
pp. 10552
<p>SystemC is rapidly gaining wide acceptance as a simulation framework for SoC and embedded processors. While its main assets are modularity and the very fact it is becoming a de facto standard, the evolution of the SystemC framework (from version 0...
 
On Increasing Architecture Awareness in Program Optimizations to Bridge the Gap between Peak and Sustained Processor Performance — Matrix-Multiply Revisited
Found in: SC Conference
By David Parello, Olivier Temam, Jean-Marie Verdun
Issue Date:November 2002
pp. 31
As the complexity of processor architectures increases, there is a widening gap between peak processor performance and sustained processor performance so that programs now tend to exploit only a fraction of available performance. While there is a tremendou...
 
An Algorithm for Optimally Exploiting Spatial and Temporal Locality in Upper Memory Levels
Found in: IEEE Transactions on Computers
By Olivier Temam
Issue Date:February 1999
pp. 150-158
<p><b>Abstract</b>—In this study, we present an extension of Belady's MIN algorithm that optimally and simultaneously exploits spatial and temporal locality. Thus, this algorithm provides a performance upper bound of upper memory levels. ...
 
A Cache Visualization Tool
Found in: Computer
By Eric van der Deijl, Gerco Kanbier, Olivier Temam, Elena D. Granston
Issue Date:July 1997
pp. 71-78
<p>Cache performance strongly influences the overall performance of software. As a result, researchers continue to use cache simulators to analyze cache performance and optimization. Most cache simulators, however, provide only raw, global informatio...
 
ArchRanker: A ranking approach to design space exploration
Found in: 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)
By Tianshi Chen,Qi Guo,Ke Tang,Olivier Temam,Zhiwei Xu,Zhi-Hua Zhou,Yunji Chen
Issue Date:June 2014
pp. 85-96
Architectural Design Space Exploration (DSE) is a notoriously difficult problem due to the exponentially large size of the design space and long simulation times. Previously, many studies proposed to formulate DSE as a regression problem which predicts arc...
   
Statistical Performance Comparisons of Computers
Found in: IEEE Transactions on Computers
By Tianshi Chen,Qi Guo,Olivier Temam,Yue Wu,Yungang Bao,Zhiwei Xu,Yunji Chen
Issue Date:April 2014
pp. 1
As a fundamental task in computer architecture research, performance comparison has been continuously hampered by the variability of computer performance. In traditional performance comparisons, the impact of performance variability is usually ignored (i.e...
 
Scalable hardware support for conditional parallelization
Found in: Proceedings of the 19th international conference on Parallel architectures and compilation techniques (PACT '10)
By Jose Duato, Olivier Certner, Olivier Temam, Zheng Li
Issue Date:September 2010
pp. 157-168
Parallel programming approaches based on task division/spawning are getting increasingly popular because they provide for a simple and elegant abstraction of parallelization, while achieving good performance on workloads which are traditionally complex to ...
     
A practical approach for reconciling high and predictable performance in non-regular parallel programs
Found in: Proceedings of the conference on Design, automation and test in Europe (DATE '08)
By Frederic Arzel, Nathalie Drach, Olivier Certner, Olivier Temam, Pierre Palatin, Zheng Li
Issue Date:March 2008
pp. 1-30
Increasingly complex consumer electronics applications call for embedded processors with higher performance. Multi-cores are capable of delivering the required performance. However, many of these embedded applications must meet some form of soft real-time ...
     
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
Found in: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems (ASPLOS '14)
By Chengyong Wu, Jia Wang, Ninghui Sun, Olivier Temam, Tianshi Chen, Yunji Chen, Zidong Du
Issue Date:March 2014
pp. 269-284
Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Netwo...
     
Continuous real-world inputs can open up alternative accelerator designs
Found in: Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13)
By Antoine Joubert, Bilel Belhadj, Olivier Temam, Rodolphe Héliot, Zheng Li
Issue Date:June 2013
pp. 1-12
Motivated by energy constraints, future heterogeneous multi-cores may contain a variety of accelerators, each targeting a subset of the application spectrum. Beyond energy, the growing number of faults steers accelerator research towards fault-tolerant acc...
     
Deconstructing iterative optimization
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Chengyong Wu, Grigori Fursin, Lieven Eeckhout, Olivier Temam, Shuangde Fang, Yang Chen, Yuanjie Huang
Issue Date:September 2012
pp. 1-30
Iterative optimization is a popular compiler optimization approach that has been studied extensively over the past decade. In this article, we deconstruct iterative optimization by evaluating whether it works across datasets and by analyzing why it works. ...
     
A defect-tolerant accelerator for emerging high-performance applications
Found in: Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA '12)
By Olivier Temam
Issue Date:June 2012
pp. 356-367
Due to the evolution of technology constraints, especially energy constraints which may lead to heterogeneous multi-cores, and the increasing number of defects, the design of defect-tolerant accelerators for heterogeneous multi-cores may become a major mic...
     
Iterative optimization for the data center
Found in: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '12)
By Chengyong Wu, Lieven Eeckhout, Olivier Temam, Shuangde Fang, Yang Chen
Issue Date:March 2012
pp. 49-60
Iterative optimization is a simple but powerful approach that searches for the best possible combination of compiler optimizations for a given workload. However, each program, if not each data set, potentially favors a different combination. As a result, i...
     
Automatic abstraction and fault tolerance in cortical microachitectures
Found in: Proceeding of the 38th annual international symposium on Computer architecture (ISCA '11)
By Atif Hashmi, Hugues Berry, Mikko Lipasti, Olivier Temam
Issue Date:June 2011
pp. 1-10
Recent advances in the neuroscientific understanding of the brain are bringing about a tantalizing opportunity for building synthetic machines that perform computation in ways that differ radically from traditional Von Neumann machines. These brain-like ar...
     
Collective optimization: A practical collaborative approach
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Grigori Fursin, Olivier Temam
Issue Date:December 2010
pp. 1-29
Iterative optimization is a popular and efficient research approach to optimize programs using feedback-directed compilation. However, one of the key limitations that prevented widespread use in production compilers and day-to-day practice is the necessity...
     
A memory interface for multi-purpose multi-stream accelerators
Found in: Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems (CASES '10)
By Hugues Berry, Olivier Temam, Sami Yehia, Sylvain Girbal, Zheng LI
Issue Date:October 2010
pp. 107-116
Power and programming challenges make heterogeneous multi-cores composed of cores and ASICs an attractive alternative to homogeneous multi-cores. Recently, multi-purpose loop-based generated accelerators have emerged as an especially attractive accelerator...
     
The rebirth of neural networks
Found in: Proceedings of the 37th annual international symposium on Computer architecture (ISCA '10)
By Olivier Temam
Issue Date:June 2010
pp. 72-ff
After the hype of the 1990s, where companies like Intel or Philips built commercial hardware systems based on neural networks, the approach quickly lost ground for multiple reasons: hardware neural networks were no match for software neural networks run on...
     
Fast compiler optimisation evaluation using code-feature based performance prediction
Found in: Proceedings of the 4th international conference on Computing frontiers (CF '07)
By Bjorn Franke, Christophe Dubach, Grigori Fursin, John Cavazos, Michael F.P. O'Boyle, Olivier Temam
Issue Date:May 2007
pp. 131-142
Performance tuning is an important and time consuming task which may have to be repeated for each new application and platform. Although iterative optimisation can automate this process, it still requires many executions of different versions of the progra...
     
Automatic performance model construction for the fast software exploration of new hardware designs
Found in: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems (CASES '06)
By Christophe Dubach, Edwin Bonilla, Felix Agakov, Grigori Fursin, John Cavazos, Michael F. P. O'Boyle, Olivier Temam
Issue Date:October 2006
pp. 24-34
Developing an optimizing compiler for a newly proposed architecture is extremely difficult when there is only a simulator of the machine available. Designing such a compiler requires running many experiments in order to understand how different optimizatio...
     
Load squared: adding logic close to memory to reduce the latency of indirect loads with high miss ratios
Found in: Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture (MEDEA '04)
By Jean-Francois Collard, Olivier Temam, Sami Yehia
Issue Date:September 2004
pp. 5-es
Indirect memory accesses, where a load is fed by another load, are ubiquitous because of rich data structures and sophisticated software conventions, such as the use of linkage tables and position independent code. Unfortunately, they can be costly: if bot...
     
DiST: a simple, reliable and scalable method to significantly reduce processor architecture simulation time
Found in: Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems (SIGMETRICS '03)
By Albert Cohen, Gilles Mouchard, Olivier Temam, Sylvain Girbal
Issue Date:June 2003
pp. 1-12
While architecture simulation is often treated as a methodology issue, it is at the core of most processor architecture research works, and simulation speed is often the bottleneck of the typical trial-and-error research process. To speedup simulation duri...
     
Investigating optimal local memory performance
Found in: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems (ASPLOS-VIII)
By Olivier Temam
Issue Date:October 1998
pp. 205-209
Recent work has demonstrated that, cache space is often poorly utilized. However, no previous work has yet demonstrated upper bounds on what a cache or local memory could achieve when exploiting both spatial and temporal locality. Belady's MIN algorithm do...
     
Data caches for superscalar processors
Found in: Proceedings of the 11th international conference on Supercomputing (ICS '97)
By Juan J. Navarro, Olivier Temam, Toni Juan
Issue Date:July 1997
pp. 60-67
To minimize the amount of computation and storage for parallel sparse factorization, sparse matrices have to be reordered prior to factorization. We show that none of the popular ordering heuristics proposed before, namely, mulitple minimum degree and nest...
     
A quantitative analysis of loop nest locality
Found in: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems (ASPLOS-VII)
By Kathryn S. McKinley, Olivier Temam
Issue Date:October 1996
pp. 205-209
This paper analyzes and quantifies the locality characteristics of numerical loop nests in order to suggest future directions for architecture and software cache optimizations. Since most programs spend the majority of their time in nests, the vast majorit...
     
Improving single-process performance with multithreaded processors
Found in: Proceedings of the 10th international conference on Supercomputing (ICS '96)
By Alexandre Farcy, Olivier Temam
Issue Date:May 1996
pp. 350-357
To minimize the amount of computation and storage for parallel sparse factorization, sparse matrices have to be reordered prior to factorization. We show that none of the popular ordering heuristics proposed before, namely, mulitple minimum degree and nest...
     
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks
Found in: ACM Transactions on Computer Systems (TOCS)
By Kathryn S. McKinley, Olivier Temam
Issue Date:February 1992
pp. 288-336
This article analyzes and quantifies the locality characteristics of numerical loop nests in order to suggest future directions for architecture and software cache optimizations. Since most programs spend the majority of their time in nests, the vast major...
     
Influence of cross-interferences on blocked loops: a case study with matrix-vector multiply
Found in: ACM Transactions on Programming Languages and Systems (TOPLAS)
By Christine Fricker, Olivier Temam, William Jalby
Issue Date:January 1988
pp. 561-575
State-of-the art data locality optimizing algorithms are targeted for local memories rather than for cache memories. Recent work on cache interferences seems to indicate that these phenomena can severely affect blocked algorithms cache performance. Because...
     
 1