Search For:

Displaying 1-34 out of 34 total
Cache-aware Roofline model: Upgrading the loft
Found in: IEEE Computer Architecture Letters
By Aleksandar Ilic,Frederico Pratas,Leonel Sousa
Issue Date:January 2014
pp. 1-1
The Roofline model graphically represents the attainable upper bound performance of a computer architecture. This paper analyzes the original Roofline model and proposes a novel approach to provide a more insightful performance modeling of modern architect...
 
A compact and scalable RNS architecture
Found in: 2013 IEEE 24th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
By Pedro Miguens Matutino,Ricardo Chaves,Leonel Sousa
Issue Date:June 2013
pp. 125-132
This paper proposes a unified architecture for designing Residue Number System (RNS) based processors for moduli sets with an arbitrary number of channels. Recently, new RNS moduli sets have been proposed in order to increase the dynamic range and reduce t...
 
Accelerating the Computation of Induced Dipoles for Molecular Mechanics with Dataflow Engines
Found in: 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)
By Frederico Pratas,Diego Oriato,Oliver Pell,Ricardo A. Mata,Leonel Sousa
Issue Date:April 2013
pp. 177-180
In Molecular Mechanics simulations, the treatment of electrostatics is the most computational intensive task. Modern force fields, such as the AMOEBA, which include explicit polarization effects, are particularly computationally demanding. We propose a sta...
 
On Realistic Divisible Load Scheduling in Highly Heterogeneous Distributed Systems
Found in: Parallel, Distributed, and Network-Based Processing, Euromicro Conference on
By Aleksandar Ilic,Leonel Sousa
Issue Date:February 2012
pp. 426-433
This paper investigates the problem of scheduling discretely divisible applications in highly heterogeneous distributed platforms which deploy modern desktop systems with limited memory as computing nodes. We propose an algorithm for hierarchical load bala...
 
Binary-to-RNS Conversion Units for moduli {2^n ± 3}
Found in: Digital Systems Design, Euromicro Symposium on
By Pedro Miguens Matutino,Ricardo Chaves,Leonel Sousa
Issue Date:September 2011
pp. 460-467
In this paper Residue Number Systems (RNS) conversion structures from Binary to RNS modulo {2n ± 3} are proposed. These structures are based on arithmetic calculations without the need for Lookup Tables as in the related art. Additionally, the required 4:...
 
Arithmetic Units for RNS Moduli {2n-3} and {2n+3} Operations
Found in: Digital Systems Design, Euromicro Symposium on
By Pedro Miguel Matutino, Ricardo Chaves, Leonel Sousa
Issue Date:September 2010
pp. 243-246
A new moduli set {2n-1, 2n+3, 2n+1, 2n-3} has recently been proposed to represent numbers in Residue Number Systems (RNS), increasing the number of channels. With this, the processing time can be reduced by simultaneously exploiting the carry-free characte...
 
Efficient Independent Component Analysis on a GPU
Found in: Computer and Information Technology, International Conference on
By Rui Ramalho, Pedro Tomás, Leonel Sousa
Issue Date:July 2010
pp. 1128-1133
Several problems in the signal processing field require generating suitable representations of data. One possible form of representation is given by independent component analysis (ICA). The computation of these representations can be quite expensive, espe...
 
Collaborative execution environment for heterogeneous parallel systems
Found in: Parallel and Distributed Processing Workshops and PhD Forum, 2011 IEEE International Symposium on
By Aleksandar Ilic,Leonel Sousa
Issue Date:April 2010
pp. 1-8
Nowadays, commodity computers are complex heterogenous systems that provide a huge amount of computational power. However, to take advantage of this power we have to orchestrate the use of processing units with different characteristics. Such distributed m...
 
Massively LDPC Decoding on Multicore Architectures
Found in: IEEE Transactions on Parallel and Distributed Systems
By Gabriel Falcao, Leonel Sousa, Vitor Silva
Issue Date:February 2011
pp. 309-322
Unlike usual VLSI approaches necessary for the computation of intensive Low-Density Parity-Check (LDPC) code decoders, this paper presents flexible software-based LDPC decoders. Algorithms and data structures suitable for parallel computing are proposed in...
 
CaravelaMPI: Message Passing Interface for Parallel GPU-Based Applications
Found in: Parallel and Distributed Computing, International Symposium on
By Shinichi Yamagiwa, Leonel Sousa
Issue Date:July 2009
pp. 161-168
With the ever increasing demand for high quality 3D image processing on markets such as cinema and gaming, graphics processing units (GPUs) capabilities have shown tremendous advances. Although GPU-based cluster computing, which uses GPUs as the processing...
 
Distributed Software Platform for Automation and Control of General Anaesthesia
Found in: Parallel and Distributed Computing, International Symposium on
By Gesner Passos, Nuno Roma, Bertinho Andrade da Costa, Leonel Sousa, Joao Miranda Lemos
Issue Date:July 2009
pp. 135-142
A parallel computer architecture and a distributed software platform for automation and control of general anesthesia is proposed in this paper. The system is a prototype research platform, intended to help on the development, simulation and test of new co...
 
Compact and Flexible Microcoded Elliptic Curve Processor for Reconfigurable Devices
Found in: Field-Programmable Custom Computing Machines, Annual IEEE Symposium on
By Samuel Antão, Ricardo Chaves, Leonel Sousa
Issue Date:April 2009
pp. 193-200
This paper presents a very compact and flexible processor to support Elliptic Curve (EC) cryptosystems based on GF(2^m) finite fields. This processor can be customized with a two-level microinstruction hierarchy that allows for customization of both field ...
 
Application Specific Programmable IP Core for Motion Estimation: Technology Comparison Targeting Efficient Embedded Co-Processing Units
Found in: Digital Systems Design, Euromicro Symposium on
By Nuno Sebastião, Tiago Dias, Nuno Roma, Paulo Flores, Leonel Sousa
Issue Date:September 2008
pp. 181-188
The implementation of a recently proposed IP core of an efficient motion estimation co-processor is considered. Some significant functional improvements to the base architecture are proposed, as well as the presentation of a detailed description of the int...
 
An RNS based Specific Processor for Computing the Minimum Sum-of-Absolute-Differences
Found in: Digital Systems Design, Euromicro Symposium on
By Pedro Miguens Matutino, Leonel Sousa
Issue Date:September 2008
pp. 768-775
The Sum of Absolute Differences (SAD) is a distance metric commonly used to determine the similarity between two data sets. A very recent method for directly comparing the magnitude of two numbers represented in Residue Number Systems (RNS) leads to the po...
 
Distributed Web-based Platform for Computer Architecture Simulation
Found in: Parallel and Distributed Computing, International Symposium on
By Aleksandar Ilic,Frederico Pratas,Leonel Sousa
Issue Date:July 2008
pp. 317-324
Computer architecture simulation and modeling require a huge amount of time and resources, not only for the simulation itself but also regarding the configuration and submission procedures. A quite common simulation toolset (SimpleScalar) has been used to ...
 
Heuristic Optimization Methods for Improving Performance of Recursive General Purpose Applications on GPUs
Found in: Parallel and Distributed Computing, International Symposium on
By Shinichi Yamagiwa,Koichi Wada,Leonel Sousa
Issue Date:July 2008
pp. 325-332
Due to the demand of high definition graphics presentation in gaming and video market, Graphics Processing Units (GPUs) have drastically increased their computational capacities. General-Purpose computation on GPUs uses the fragment shader multicore of the...
 
Design and implementation of a tool for modeling and programming deadlock free meta-pipeline applications
Found in: Parallel and Distributed Processing Symposium, International
By Shinichi Yamagiwa, Leonel Sousa
Issue Date:April 2008
pp. 1-8
The Caravela platform has been designed to develop a parallel and distributed stream-based computing paradigm, namely supported on the pipeline processing approach herein designated by meta-pipeline. This paper is focused on the design and implementation o...
 
Merged Computation for Whirlpool Hashing
Found in: Design, Automation and Test in Europe Conference and Exhibition
By Ricardo Chaves, Georgi Kuzmanov, Leonel Sousa, Stamatis Vassiliadis
Issue Date:March 2008
pp. 272-275
This paper presents an improved hardware structure for the computation of the Whirlpool hash function. By merging the round key computation with the data compression and by using embedded memories to perform part of the Galois Field (2<sup>8</sup&...
 
A Parallel Algorithm for Advanced Video Motion Estimation on Multicore Architectures
Found in: Complex, Intelligent and Software Intensive Systems, International Conference
By Svetislav Momcilovic, Leonel Sousa
Issue Date:March 2008
pp. 831-836
The new Advanced Video Coding (AVC) standards further exploit temporal correlation between images on a sequence by considering multiple reference frames and variable block sizes. It improves the compression efficiency at the cost of a significant computati...
 
Meta-Pipeline: A New Execution Mechanism for Distributed Pipeline Processing
Found in: Parallel and Distributed Computing, International Symposium on
By Shinichi Yamagiwa, Leonel Sousa, Tomás Brandão
Issue Date:July 2007
pp. 5
The Caravela platform has been proposed by the authors of this paper to perform distributed stream-based computing on general purpose computation. This platform uses a secured execution unit called flow-model that prevents remote users to touch local infor...
 
Caravela: A Novel Stream-Based Distributed Computing Environment
Found in: Computer
By Shinichi Yamagiwa, Leonel Sousa
Issue Date:May 2007
pp. 70-77
Distributed computing implies sharing computation, data, and network resources around the world. The Caravela environment applies a proposed flow model for stream computing on graphics processing units that encapsulates a program to be executed in local or...
 
The Midlifekicker Microarchitecture Evaluation Metric
Found in: Application-Specific Systems, Architectures and Processors, IEEE International Conference on
By Stamatis Vassiliadis, Leonel Sousa, Georgi N. Gaydadjiev
Issue Date:July 2005
pp. 92-100
<p>We introduce the midlifekicker metric for evaluating microarchitectures mostly during the design process. We assume a microarchitecture designed at a time T-1 and estimate if a new microarchitecture projected for time T has advantages over the mic...
 
On the Implementation and Evaluation of Berkeley Sockets on Maestro2 cluster computing environment
Found in: Parallel and Distributed Computing, International Symposium on
By Ricardo Guapo, Leonel Sousa, Shinichi Yamagiwa
Issue Date:July 2005
pp. 317-324
The support on cluster environments of
 
Communication Contention in Task Scheduling
Found in: IEEE Transactions on Parallel and Distributed Systems
By Oliver Sinnen, Leonel A. Sousa
Issue Date:June 2005
pp. 503-515
<p><b>Abstract</b>—Task scheduling is an essential aspect of parallel programming. Most heuristics for this NP-hard problem are based on a simple system model that assumes fully connected processors and concurrent interprocessor communica...
 
{\text{\{ 2}}^{\text{n}} + 1,2^{n + k} ,2^n - 1\} : A New RNS Moduli Set Extension
Found in: Digital Systems Design, Euromicro Symposium on
By Ricardo Chaves, Leonel Sousa
Issue Date:September 2004
pp. 210-217
The increasing usage of Residual Number System (RNS) in signal processing applications demands the development of new and more adaptable RNS moduli sets and arithmetic units. This paper presents a new adaptable moduli set extension for the traditional modu...
 
Task Scheduling: Considering the Processor Involvement in Communication
Found in: Parallel and Distributed Computing, International Symposium on
By Oliver Sinnen, Leonel Sousa
Issue Date:July 2004
pp. 328-335
Classical task scheduling employs a very simplified model of the target parallel system. Experiments demonstrated that this leads to inaccurate and inefficient schedules. Contention aware scheduling heuristics take the contention for communication resource...
 
RDSP: A RISC DSP based on Residue Number System
Found in: Digital Systems Design, Euromicro Symposium on
By Ricardo Chaves, Leonel Sousa
Issue Date:September 2003
pp. 128
This paper is focused on low power programmable fast Digital Signal Processors (DSP) design based on a configurable 5-stage RISC core architecture and on Residue Number Systems (RNS). Several innovative aspects are introduced at the control and datapath ar...
 
Comparison of Contention Aware List Scheduling Heuristics for Cluster Computing
Found in: Parallel Processing Workshops, International Conference on
By Oliver Sinnen, Leonel Sousa
Issue Date:September 2001
pp. 0382
Abstract: In the area of static scheduling, list scheduling is one of the most common heuristics for the temporal and spatial assignment of a Directed Acyclic Graph (DAG) to a target machine. As most heuristics, list scheduling assumes fully connected homo...
 
The CRNS framework and its application to programmable and reconfigurable cryptography
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Leonel Sousa, Samuel Antão
Issue Date:January 2013
pp. 1-25
This article proposes the Computing with the ResidueNumber System (CRNS) framework, which aims at the design automation of accelerators for Modular Arithmetic (MA). The framework provides a comprehensive set of tools ranging from a programming language and...
     
Iterative induced dipoles computation for molecular mechanics on GPUs
Found in: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU '10)
By Frederico Pratas, Leonel Sousa, Ricardo A. Mata
Issue Date:March 2010
pp. 111-120
In this work, we present a first step towards the efficient implementation of polarizable molecular mechanics force fields with GPU acceleration. The computational bottleneck of such applications is found in the treatment of electrostatics, where higher-or...
     
Low power microarchitecture with instruction reuse
Found in: Proceedings of the 2008 conference on Computing frontiers (CF '08)
By Frederico Pratas, Georgi Gaydadjiev, Leonel Sousa, Mladen Berekovic, Stefanos Kaxiras
Issue Date:May 2008
pp. 353-358
Power consumption has become a very important metric and challenging research topic in the design of microprocessors in the recent years. The goal of this work is to improve power efficiency of superscalar processors through instruction reuse at the execut...
     
Merged computation for Whirlpool hashing
Found in: Proceedings of the conference on Design, automation and test in Europe (DATE '08)
By Georgi Kuzmanov, Leonel Sousa, Ricardo Chaves, Stamatis Vassiliadis
Issue Date:March 2008
pp. 1-30
This paper presents an improved hardware structure for the computation of the Whirlpool hash function. By merging the round key computation with the data compression and by using embedded memories to perform part of the Galois Field (28) multiplication, a ...
     
Design and implementation of a stream-based distributedcomputing platform using graphics processing units
Found in: Proceedings of the 4th international conference on Computing frontiers (CF '07)
By Leonel Sousa, Shinichi Yamagiwa
Issue Date:May 2007
pp. 197-204
Anonymous use of computing resources spread over the world becomes one of the main goals in GRID environments. In GRID-based computing, the security of users or of contributors of computing resources is crucial to execute processes in a safe way. This pape...
     
A programmable cellular neural network circuit
Found in: Proceedings of the 17th symposium on Integrated circuits and system design (SBCCI '04)
By Jorge R. Fernandes, Leonel Sousa, Michel Leong, Pedro Vasconcelos
Issue Date:September 2004
pp. 186-191
In this paper we propose and develop a fully programmable CNN circuit. The CNN coefficients are digitally programmable using a Digital to Analog Converter (DAC), resulting in added flexibility.CNNs with 4x4 and 16x16 cells are designed and tested, exhibiti...
     
 1