Search For:

Displaying 1-50 out of 54 total
An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication
Found in: IEEE Transactions on Parallel and Distributed Systems
By Vasileios Karakasis,Theodoros Gkountouvas,Kornilios Kourtis,Georgios Goumas,Nectarios Koziris
Issue Date:October 2013
pp. 1930-1940
Sparse matrix-vector multiplication ($({\rm SpM}\times{\rm V})$) has been characterized as one of the most significant computational scientific kernels. The key algorithmic characteristic of the $({\rm SpM}\times{\rm V})$ kernel, that inhibits it from achi...
 
Improving the Performance of the Symmetric Sparse Matrix-Vector Multiplication in Multicore
Found in: 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
By Theodoros Gkountouvas,Vasileios Karakasis,Kornilios Kourtis,Georgios Goumas,Nectarios Koziris
Issue Date:May 2013
pp. 273-283
Symmetric sparse matrices arise often in the solution of sparse linear systems. Exploiting the non-zero element symmetry in order to reduce the overall matrix size is very tempting for optimizing the symmetric Sparse Matrix-Vector Multiplication kernel (Sp...
 
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA
Found in: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)
By Dimitrios Tsoumakos,Ioannis Konstantinou,Christina Boumpouka,Spyros Sioutas,Nectarios Koziris
Issue Date:May 2013
pp. 34-41
This work presents TIRAMOLA, a cloud-enabled, open-source framework to perform automatic resizing of NoSQL clusters according to user-defined policies. Decisions on adding or removing worker VMs from a cluster are modeled as a Markov Decision Process and t...
 
~okeanos: Building a Cloud, Cluster by Cluster
Found in: IEEE Internet Computing
By Vangelis Koukis,Constantinos Venetsanopoulos,Nectarios Koziris
Issue Date:May 2013
pp. 67-71
Traditional cluster management matters even when building a production cloud. Here, the authors present their approach to building the ~okeanos cloud service. They describe how they built the open source software that powers it to fit the cluster managemen...
 
Characterizing thread placement in the IBM POWER7 processor
Found in: 2012 IEEE International Symposium on Workload Characterization (IISWC)
By Stelios Manousopoulos,Miquel Moreto,Roberto Gioiosa,Nectarios Koziris,Francisco J. Cazorla
Issue Date:November 2012
pp. 120-130
There is a clear trend in current processor design towards the combination of several thread level parallelism paradigms on the same chip, exemplified by processors such as the IBM POWER7. In those processors, the way threads are assigned to different hard...
 
An Approach to Parallelize Kruskal's Algorithm Using Helper Threads
Found in: 2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
By Anastasios Katsigiannis,Nikos Anastopoulos,Konstantinos Nikas,Nectarios Koziris
Issue Date:May 2012
pp. 1601-1610
In this paper we present a Helper Threading scheme used to parallelize efficiently Kruskal's Minimum Spanning Forest algorithm. This algorithm is known for exhibiting inherently sequential characteristics. More specifically, the strict order by which the a...
 
Efficient Updates for Web-Scale Indexes over the Cloud
Found in: 2012 IEEE International Conference on Data Engineering Workshops (ICDEW)
By Panagiotis Antonopoulos,Ioannis Konstantinou,Dimitrios Tsoumakos,Nectarios Koziris
Issue Date:April 2012
pp. 135-142
In this paper, we present a distributed system which enables fast and frequent updates on web-scale Inverted Indexes. The proposed update technique allows incremental processing of new or modified data and minimizes the changes required to the index, signi...
 
Fast and Cost-Effective Online Load-Balancing in Distributed Range-Queriable Systems
Found in: IEEE Transactions on Parallel and Distributed Systems
By Ioannis Konstantinou, Dimitrios Tsoumakos, Nectarios Koziris
Issue Date:August 2011
pp. 1350-1364
Distributed systems such as Peer-to-Peer overlays have been shown to efficiently support the processing of range queries over large numbers of participating hosts. In such systems, uneven load allocation has to be effectively tackled in order to minimize o...
 
Solving the advection PDE on the cell broadband engine
Found in: Parallel and Distributed Processing Workshops and PhD Forum, 2011 IEEE International Symposium on
By Georgios Rokos,Gerassimos Peteinatos,Georgia Kouveli,Georgios Goumas,Kornilios Kourtis,Nectarios Koziris
Issue Date:April 2010
pp. 1-8
In this paper we present the venture of porting two different algorithms for solving the two-dimensional advection PDE on the CBE platform, an in-place and an out-of-place one, and compare their computational performance, completion time and code productiv...
 
A Comparative Study of Blocking Storage Methods for Sparse Matrices on Multicore Architectures
Found in: Computational Science and Engineering, IEEE International Conference on
By Vasileios Karakasis, Georgios Goumas, Nectarios Koziris
Issue Date:August 2009
pp. 247-256
Sparse Matrix-Vector multiplication (SpMV) is a very challenging computationalkernel, since its performance depends greatly on both the input matrix and theunderlying architecture. The main problem of SpMV is its high demands on memorybandwidth, which cann...
 
Early experiences on accelerating Dijkstra's algorithm using transactional memory
Found in: Parallel and Distributed Processing Symposium, International
By Nikos Anastopoulos,Konstantinos Nikas,Georgios Goumas,Nectarios Koziris
Issue Date:May 2009
pp. 1-8
In this paper we use Dijkstra's algorithm as a challenging, hard to parallelize paradigm to test the efficacy of several parallelization techniques in a multicore architecture. We consider the application of Transactional Memory (TM) as a means of concurre...
 
Exploring the effect of block shapes on the performance of sparse kernels
Found in: Parallel and Distributed Processing Symposium, International
By Vasileios Karakasis,Georgios Goumas,Nectarios Koziris
Issue Date:May 2009
pp. 1-8
In this paper we explore the impact of the block shape on blocked and vectorized versions of the Sparse Matrix-Vector Multiplication (SpMV) kernel and build upon previous work by performing an extensive experimental evaluation of the most widespread blocki...
 
Improving the Performance of Multithreaded Sparse Matrix-Vector Multiplication Using Index and Value Compression
Found in: Parallel Processing, International Conference on
By Kornilios Kourtis, Georgios Goumas, Nectarios Koziris
Issue Date:September 2008
pp. 511-519
The Sparse Matrix-Vector Multiplication kernel exhibits limited potential for taking advantage of modern shared memory architectures due to its large memory bandwidth requirements. To decrease memory contention and improve the performance of the kernel we ...
 
Support for Concept Hierarchies in DHTs
Found in: Peer-to-Peer Computing, IEEE International Conference on
By Athanasia Asiki, Katerina Doka, Dimitrios Tsoumakos, Nectarios Koziris
Issue Date:September 2008
pp. 121-124
Concept hierarchies greatly help in the organization and reuse of information and are widely used in a variety of applications, such as data warehouses. In this paper, we describe a method for efficiently storing and querying data organized into concept hi...
 
Synchronized send operations for efficient streaming block I/O over Myrinet
Found in: Parallel and Distributed Processing Symposium, International
By Evangelos Koukis, Anastassios Nanos, Nectarios Koziris
Issue Date:April 2008
pp. 1-8
Providing scalable clustered storage in a cost-effective way depends on the availability of an efficient network block device (nbd) layer. We study the performance of gmblock, an nbd server over Myrinet utilizing a direct disk-to-NIC data path which bypass...
 
Facilitating efficient synchronization of asymmetric threads on hyper-threaded processors
Found in: Parallel and Distributed Processing Symposium, International
By Nikos Anastopoulos, Nectarios Koziris
Issue Date:April 2008
pp. 1-8
So far, the privileged instructions MONITOR and MWAIT introduced with Intel Prescott core, have been used mostly for inter-thread synchronization in operating systems code. In a hyper-threaded processor, these instructions offer a “performance-optimized” w...
 
Evaluation of dynamic scheduling methods in simulations of storm-time ion acceleration
Found in: Parallel and Distributed Processing Symposium, International
By Ioannis Riakiotakis, Georgios Goumas, Nectarios Koziris, Fiori-Anastasia Metallinou, Ioannis A. Daglis
Issue Date:April 2008
pp. 1-8
In this paper we investigate the applicability of classic dynamic loop scheduling methods on a numerical simulation code that calculates the trajectories of charged particles in the earth’s magnetosphere. The numerical application under consideration inves...
 
Understanding the Performance of Sparse Matrix-Vector Multiplication
Found in: Parallel, Distributed, and Network-Based Processing, Euromicro Conference on
By Georgios Goumas, Kornilios Kourtis, Nikos Anastopoulos, Vasileios Karakasis, Nectarios Koziris
Issue Date:February 2008
pp. 283-292
In this paper we revisit the performance issues of the widely used sparse matrix-vector multiplication??kernel on modern microarchitectures. Previous scientific work reports a number of different factors that may significantly reduce performance. However, ...
 
Global-scale peer-to-peer file services with DFS
Found in: Grid Computing, IEEE/ACM International Workshop on
By Antony Chazapis, Georgios Tsoukalas, Georgios Verigakis, Kornilios Kourtis, Aristidis Sotiropoulos, Nectarios Koziris
Issue Date:September 2007
pp. 251-258
The global inter-networking infrastructure that has become essential for contemporary day-to-day computing and communication tasks, has also enabled the deployment of several large-scale data sharing overlays. Communities collaboratively aggregate and dist...
 
Efficient Block Device Sharing over Myrinet with Memory Bypass
Found in: Parallel and Distributed Processing Symposium, International
By Evangelos Koukis, Nectarios Koziris
Issue Date:March 2007
pp. 29
Efficient sharing of block devices over an interconnection network is an important step in deploying a shared-disk parallel filesystem on a cluster of SMPs. In this paper we present gmbock, a client/server system for network sharing of storage devices over...
 
Coarse-grain Parallel Execution for 2-dimensional PDE Problems
Found in: Parallel and Distributed Processing Symposium, International
By Georgios Goumas, Nikolaos Drosinos, Vasileios Karakasis, Nectarios Koziris
Issue Date:March 2007
pp. 381
This paper presents a new approach for the execution of coarse-grain (tiled) parallel SPMD code for applications derived from the explicit discretization of 2-dimensional PDE problems with finite-differencing schemes. Tiling transformation is an efficient ...
 
Exploring the Performance Limits of Simultaneous Multithreading for Scientific Codes
Found in: Parallel Processing, International Conference on
By Evangelia Athanasaki, Nikos Anastopoulos, Kornilios Kourtis, Nectarios Koziris
Issue Date:August 2006
pp. 45-54
Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. The speedup of a single application that is parallelized into multiple threads, is often se...
 
Memory and Network Bandwidth Aware Scheduling of Multiprogrammed Workloads on Clusters of SMPs
Found in: Parallel and Distributed Systems, International Conference on
By Evangelos Koukis, Nectarios Koziris
Issue Date:July 2006
pp. 345-354
Symmetric Multiprocessors (SMPs), combined with modern interconnection technologies are commonly used to build cost-effective compute clusters. However, contention among processors for access to shared resources, as is the main memory bus and the NIC can l...
 
A Peer-to-Peer Replica Management Service for High-Throughput Grids
Found in: Parallel Processing, International Conference on
By Antony Chazapis, Antonis Zissimos, Nectarios Koziris
Issue Date:June 2005
pp. 443-451
Future high-throughput Grids may integrate millions or even billions of processing and data storage nodes. Services provided by the underlying Grid infrastructure may have to be able to scale to capacities not even imaginable today. In this paper we concen...
 
Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops
Found in: Parallel Processing Workshops, International Conference on
By Nikolaos Drosinos, Nectarios Koziris
Issue Date:June 2005
pp. 113-120
This paper emphasizes on load balancing issues associated with hybrid programming models for the parallelization of fully permutable nested loops onto SMP clusters. Hybrid parallel programming models usually suffer from intrinsic load imbalance between thr...
 
A Tile Size Selection Analysis for Blocked Array Layouts
Found in: Interaction between Compilers and Computer Architecture, Annual Workshop on
By Evangelia Athanasaki,Nectarios Koziris,Panayiotis Tsanakas
Issue Date:February 2005
pp. 70-80
Efficient use of the memory hierarchy is essential for good performance due to the ever increasing gap between processor and memory speed. Program transformations such as loop tiling have been shown to be an effective approach to improving locality and cac...
 
Memory Bandwidth Aware Scheduling for SMP Cluster Nodes
Found in: Parallel, Distributed, and Network-Based Processing, Euromicro Conference on
By Evangelos Koukis, Nectarios Koziris
Issue Date:February 2005
pp. 187-196
Clusters of SMPs are becoming increasingly common. However, the shared memory design of SMPs and the consequential contention between system processors for access to main memory can limit their efficiency significantly. Moreover, the continuous improvement...
 
Fast Indexing for Blocked Array Layouts to Improve Multi-Level Cache Locality
Found in: Interaction between Compilers and Computer Architecture, Annual Workshop on
By Evangelia Athanasaki, Nectarios Koziris
Issue Date:February 2004
pp. 109-119
One of the key challenges computer architects and compiler writers are facing, is the increasing discrepancy between processor cycle times and main memory access times. To overcome this problem, program transformations that decrease cache misses are used, ...
 
Improving Cache Locality with Blocked Array Layouts
Found in: Parallel, Distributed, and Network-Based Processing, Euromicro Conference on
By Evangelia Athanasaki, Nectarios Koziris
Issue Date:February 2004
pp. 308
Minimizing cache misses is one of the most important factors to reduce average latency for memory accesses. Tiled codes modify the instruction stream to exploit cache locality for array accesses. In this paper, we further reduce cache misses, restructuring...
 
Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes
Found in: Parallel, Distributed, and Network-Based Processing, Euromicro Conference on
By Maria Athanasaki, Evangelos Koukis, Nectarios Koziris
Issue Date:February 2004
pp. 424
In this paper we propose several alternative methods for the compile time scheduling of Tiled Nested Loops onto a fixed size parallel architecture. We investigate the distribution of tiles among processors, provided that we have chosen either a non-overlap...
 
An Efficient Code Generation Technique for Tiled Iteration Spaces
Found in: IEEE Transactions on Parallel and Distributed Systems
By Georgios Goumas, Maria Athanasaki, Nectarios Koziris
Issue Date:October 2003
pp. 1021-1034
<p><b>Abstract</b>—This paper presents a novel approach for the problem of generating tiled code for nested for-loops, transformed by a tiling transformation. Tiling or supernode transformation has been widely used to improve locality in ...
 
Pipelined Scheduling of Tiled Nested Loops onto Clusters of SMPs Using Memory Mapped Network Interfaces
Found in: SC Conference
By Maria Athanasaki, Aristidis Sotiropoulos, Georgios Tsoukalas, Nectarios Koziris
Issue Date:November 2002
pp. 23
This paper describes the performance benefits attained using enhanced network interfaces to achieve low latency communication. We present a novel, pipelined scheduling approach which takes advantage of DMA communication mode, to send data to other nodes, w...
 
Compiling Tiled Iteration Spaces for Clusters
Found in: Cluster Computing, IEEE International Conference on
By Georgios Goumas, Nikolaos Drosinos, Maria Athanasaki, Nectarios Koziris
Issue Date:September 2002
pp. 360
This paper presents a complete end-to-end framework to generate automatic message-passing code for tiled iteration spaces. It considers general parallelepiped tiling transformations and general convex iteration spaces. We aim to address all problems concer...
 
A Pipelined Execution of Tiled Nested Loops on SMPs with Computation and Communication Overlapping
Found in: Parallel Processing Workshops, International Conference on
By Maria Athanasaki, Aristidis Sotiropoulos, Georgios Tsoukalas, Nectarios Koziris
Issue Date:August 2002
pp. 559
This paper proposes a novel approach for the parallel execution of tiled Iteration Spaces onto a cluster of SMP PC nodes. Each SMP node has multiple CPUs and a single memory mapped PCI-SCI Network Interface Card. We apply a hyperplane-based grouping transf...
 
Efficient Utilization of Memory Mapped NICs onto Clusters using Pipelined Schedules
Found in: Cluster Computing and the Grid, IEEE International Symposium on
By Aristidis Sotiropoulos, Georgios Tsoukalas, Nectarios Koziris
Issue Date:May 2002
pp. 238
This paper describes the performance benefits attained using enhanced network interfaces to achieve low latency communication. We make use of DMA communication mode, to send data to other nodes, while the CPU performs useful calculations. Zero-copy communi...
 
Enhancing the Performance of Tiled Loop Execution onto Clusters Using Memory Mapped Network Interfaces and Pipelined Schedules
Found in: Parallel and Distributed Processing Symposium, International
By Aristidis Sotiropoulos, Georgios Tsoukalas, Nectarios Koziris
Issue Date:April 2002
pp. 0166
This paper describes the performance benefits attained using enhanced network interfaces to achieve low latency com-munication. Our experimental testbed concerns the parallel execution of tiled nested loops onto a Linux PC cluster with PCI-SCI NICs (Dolphi...
 
Geometric Scheduling of 2-D Uniform Dependence Loops
Found in: Parallel and Distributed Systems, International Conference on
By Ioannis Drositis, Theodore Andronikos, Aggelos Kokorogiannis, George Papakonstantinou, Nectarios Koziris
Issue Date:June 2001
pp. 0259
Abstract: One of the primary tasks in the area of uniform dependence loops, is predicting the execution propagation, as well as finding an optimal time schedule. In this work, the problem of scheduling using wavefront prediction is presented. The geometric...
 
Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping
Found in: Parallel and Distributed Processing Symposium, International
By Georgia Goumas, Aristidis Sotiropoulos, Nectarios Koziris
Issue Date:April 2001
pp. 10039a
This paper proposes a new method for the problem of minimizing the execution time of nested for-loops using a tiling transformation. In our approach, we are interested not only in tile size and shape according to the required communication to computation r...
 
Evaluation of Loop Grouping Methods Based on Orthogonal Projection Spaces
Found in: Parallel Processing, International Conference on
By Ioannis Drositis, Giorgos Goumas, Nectarios Koziris, Panayiotis Tsanakas, George Papakonstantinou
Issue Date:August 2000
pp. 469
This paper compares three similar loop-grouping methods. All methods are based on projecting the n-dimensional iteration space J n onto a k-dimensional one, called the projected space, using (n-k) linear independent vectors. The dimension k is selected dif...
 
Optimal Scheduling for UET-UCT Grids into Fixed Number of Processors
Found in: Parallel, Distributed, and Network-Based Processing, Euromicro Conference on
By Theodore Andronikos, Nectarios Koziris, George Papakonstantinou, Panayiotis Tsanakas
Issue Date:January 2000
pp. 237
The n-dimensional grid is one of the most representative patterns of data flow in parallel computation. Many scientific algorithms, which require nearest neighbor communication in a lattice space, are modeled by a task graph with the properties of a simple...
 
An Efficient Algorithm for the Physical Mapping of Clustered Task Graphs onto Multiprocessor Architectures
Found in: Parallel, Distributed, and Network-Based Processing, Euromicro Conference on
By Nectarios Koziris, Michael Romesis, Panayiotis Tsanakas, George Papakonstantinou
Issue Date:January 2000
pp. 406
The most important issue in sequential program parallelization is the efficient assignment of computations into different processing elements. In the past, too many approaches were devoted in efficient program parallelization considering various models for...
 
Geometric Scheduling of 2-D UET-UCT Uniform Dependence Loops
Found in: Parallel, Distributed, and Network-Based Processing, Euromicro Conference on
By Ioannis Drositis, Theodore Andronikos, George Manis, George Papakonstantinou, Nectarios Koziris
Issue Date:January 2002
pp. 0343
Finding an optimal time schedule is one of the primary tasks in the area of parallelizing uniform dependence loops. Due to the existence of dependence vectors, the index space of such a loop, is split into subspaces of points that can be executed at differ...
 
Optimal Scheduling for UET-UCT Generalized n-Dimensional Grid Task Graphs
Found in: Parallel Processing Symposium, International
By Theodore Andronikos, Nectarios Koziris, George Papakonstantinou, Panayotis Tsanakas
Issue Date:April 1997
pp. 146
The n-dimensional grid is one of the most representative patterns of data flow in parallel computation. The most frequently used scheduling models for grids is the unit execution - unit communication time (UET-UCT). In this paper we enhance the model of n-...
 
Public vs private cloud usage costs: the StratusLab case
Found in: Proceedings of the 2nd International Workshop on Cloud Computing Platforms (CloudCP '12)
By Ioannis Konstantinou, Nectarios Koziris, Evangelos Floros
Issue Date:April 2012
pp. 1-6
Cloud computing claims to offer multiple advantages comparing to "traditional" computing infrastructures. These include among others: energy efficiency, reduction of the overall administration costs, better utilization of hardware resources by co-hosting m...
     
On the elasticity of NoSQL databases over cloud management platforms
Found in: Proceedings of the 20th ACM international conference on Information and knowledge management (CIKM '11)
By Christina Boumpouka, Dimitrios Tsoumakos, Evangelos Angelou, Ioannis Konstantinou, Nectarios Koziris
Issue Date:October 2011
pp. 2385-2388
NoSQL databases focus on analytical processing of large scale datasets, offering increased scalability over commodity hardware. One of their strongest features is elasticity, which allows for fairly portioned premiums and high-quality performance and direc...
     
Exploiting compression opportunities to improve SpMxV performance on shared memory systems
Found in: ACM Transactions on Architecture and Code Optimization (TACO)
By Georgios Goumas, Kornilios Kourtis, Nectarios Koziris
Issue Date:December 2010
pp. 1-31
The Sparse Matrix-Vector Multiplication (SpMxV) kernel exhibits poor scaling on shared memory systems, due to the streaming nature of its data access pattern. To decrease memory contention and improve kernel performance we propose two compression schemes: ...
     
Brown dwarf: a P2P data-warehousing system
Found in: Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM '10)
By Dimitrios Tsoumakos, Katerina Doka, Nectarios Koziris
Issue Date:October 2010
pp. 1945-1946
In this demonstration we present the Brown Dwarf, a distributed system designed to efficiently store, query and update multidimensional data. Deployed on any number of commodity nodes, our system manages to distribute large volumes of data over network pee...
     
Distributing the power of OLAP
Found in: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC '10)
By Dimitrios Tsoumakos, Katerina Doka, Nectarios Koziris
Issue Date:June 2010
pp. 324-327
In this paper we present the Brown Dwarf, a distributed system designed to efficiently store, query and update multidimensional data over an unstructured Peer-to-Peer overlay, without the use of any proprietary tool. Brown Dwarf manages to distribute a hig...
     
An adaptive online system for efficient processing of hierarchical data
Found in: Proceedings of the 18th ACM international symposium on High performance distributed computing (HPDC '09)
By Athanasia Asiki, Dimitrios Tsoumakos, Nectarios Koziris
Issue Date:June 2009
pp. 151-166
Concept hierarchies greatly help in the organization and reuse of information and are widely used in a variety of information systems applications. In this paper, we describe a method for efficiently storing and querying data organized into concept hierarc...
     
HiPPIS: an online P2P system for efficient lookups on d-dimensional hierarchies
Found in: Proceeding of the 10th ACM workshop on Web information and data management (WIDM '08)
By Dimitrios Tsoumakos, Katerina Doka, Nectarios Koziris
Issue Date:October 2008
pp. 1-2
In this paper we describe HiPPIS, a system that enables efficient storage and on-line querying of multidimensional data organized into concept hierarchies and dispersed over a network. Our scheme utilizes an adaptive algorithm that automatically adjusts th...
     
 1  2 Next >>