Search For:

Displaying 1-50 out of 58 total
BCS-MPI: A New Approach in the System Software Design for Large-Scale Parallel Computers
Found in: SC Conference
By Juan Fernández, Eitan Frachtenberg, Fabrizio Petrini
Issue Date:November 2003
pp. 57
Buffered CoScheduled MPI (BCS-MPI) introduces a new approach to design the communication layer for large-scale parallel machines. The emphasis of BCS-MPI is on the global coordination of a large number of communicating processes rather than on the traditio...
 
Scalable Hardware-Based Multicast Trees
Found in: SC Conference
By Salvador Coll, José Duato, Fabrizio Petrini, Francisco J. Mora
Issue Date:November 2003
pp. 54
This paper presents an algorithm for implementing optimal hardware-based multicast trees, on networks that provide hardware support for collective communication. Although the proposed methodology can be generalized to a wide class of networks, we apply our...
 
Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers
Found in: SC Conference
By Roberto Gioiosa, Jose Carlos Sancho, Song Jiang, Fabrizio Petrini
Issue Date:November 2005
pp. 9
<p> We describe the software architecture, technical features, and performance of TICK (Transparent Incremental Checkpointer at Kernel level), a system-level checkpointer implemented as a kernel thread, specifi- cally designed to provide fault tolera...
 
STORM: Lightning-Fast Resource Management
Found in: SC Conference
By Eitan Frachtenberg, Fabrizio Petrini, Juan Fernandez, Scott Pakin, Salvador Coll
Issue Date:November 2002
pp. 46
Although workstation clusters are a common platform for high-performance computing (HPC), they remain more difficult to manage than sequential systems or even symmetric multiprocessors. Furthermore, as cluster sizes increase, the quality of the resource-ma...
 
Performance Evaluation of I/O Traffic and Placement of I/O Nodes on a High Performance Network
Found in: Parallel and Distributed Processing Symposium, International
By Salvador Coll, Fabrizio Petrini, Eitan Frachtenberg, Adolfy Hoisie
Issue Date:April 2002
pp. 0165
A common trend in the design of large-scale clusters is to use a high-performance data network to integrate the processing nodes in a single parallel computer. In these systems the performance of the interconnect can be a limiting factor for the input/outp...
 
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q
Found in: SC Conference
By Fabrizio Petrini, Darren J. Kerbyson, Scott Pakin
Issue Date:November 2003
pp. 55
In this paper we describe how we improved the effective performance of ASCI Q, the world's second-fastest supercomputer, to meet our expectations. Using an arsenal of performance-analysis techniques including analytical models, custom microbenchmarks, full...
 
Scalable Resource Management in High Performance Computers
Found in: Cluster Computing, IEEE International Conference on
By Eitan Frachtenberg, Fabrizio Petrini, Juan Fernandez, Salvador Coll
Issue Date:September 2002
pp. 305
<p>Clusters of workstations have emerged as an important platform for building cost-effective, scalable, and highly-available computers. Although many hardware solutions are available today, the largest challenge in making large-scale clusters usable...
 
The Quadrics Network: High-Performance Clustering Technology
Found in: IEEE Micro
By Fabrizio Petrini, Wu-chun Feng, Adolfy Hoisie, Salvador Coll, Eitan Frachtenberg
Issue Date:January 2002
pp. 46-57
<p>The Quadrics network extends the native operating system in processing nodes with a network operating system and specialized hardware support in the network interface. Doing so integrates individual node's address spaces into a single, global, vir...
 
Using Multirail Networks in High-Performance Clusters
Found in: Cluster Computing, IEEE International Conference on
By Salvador Coll, Eitan Frachtenberg, Fabrizio Petrini, Adolfy Hoisie, Leonid Gurvits
Issue Date:October 2001
pp. 15
Using multiple independent networks (also known as rails) is an emerging technique to overcome bandwidth limitations and enhance fault tolerance of current high-performance clusters. We present an extensive experimental comparison of the behavior of variou...
 
Streaming, low-latency communication in on-line trading systems
Found in: Parallel and Distributed Processing Workshops and PhD Forum, 2011 IEEE International Symposium on
By Hari Subramoni,Fabrizio Petrini,Virat Agarwal,Davide Pasetto
Issue Date:April 2010
pp. 1-8
This paper presents and evaluates the performance of a prototype of an on-line OPRA data feed decoder. Our work demonstrates that, by using best-in-class commodity hardware, algorithmic innovations and careful design, it is possible to obtain the performan...
 
Fulcrum's FocalPoint FM4000: A Scalable, Low-Latency 10GigE Switch for High-Performance Data Centers
Found in: High-Performance Interconnects, Symposium on
By Uri Cummings, Dan Daly, Rebecca Collins, Virat Agarwal, Fabrizio Petrini, Michael Perrone, Davide Pasetto
Issue Date:August 2009
pp. 42-51
The convergence of different types of networks into a common data center infrastructure poses a superset challenge on the part of the underlying component technology. IP networks are feature-rich, storage networks are lossless with controlled topologies, a...
 
Guest Editors' Introduction: Hot Interconnects
Found in: IEEE Micro
By Keren Bergman, Ron Brightwell, Fabrizio Petrini
Issue Date:July 2009
pp. 5-7
We are please to present this special issue of IEEE Micro, featuring articles repersenting leading-edge activities in high-performance interconenction networks presented at the 2008 IEEE Symposium on High-Performance Interconnects at Standord University.
 
Cell Multiprocessor Communication Network: Built for Speed
Found in: IEEE Micro
By Michael Kistler, Michael Perrone, Fabrizio Petrini
Issue Date:May 2006
pp. 10-23
Multicore designs promise various power-performance and area-performance benefits. But inadequate design of the on-chip communication network can deprive applications of these benefits. To illuminate this important point in multicore processor design, the ...
 
Current Practice and a Direction Forward in Checkpoint/Restart Implementations for Fault Tolerance
Found in: Parallel and Distributed Processing Symposium, International
By José Carlos Sancho, Fabrizio Petrini, Kei Davis, Roberto Gioiosa, Song Jiang
Issue Date:April 2005
pp. 300b
Checkpoint/restart is a general idea for which particular implementations enable various functionalities in computer systems, including process migration, gang scheduling, hibernation, and fault tolerance. For fault tolerance, in current practice, implemen...
 
Monitoring and Debugging Parallel Software with BCS-MPI on Large-Scale Clusters
Found in: Parallel and Distributed Processing Symposium, International
By Juan Fernández, Fabrizio Petrini, Eitan Frachtenberg
Issue Date:April 2005
pp. 300a
Buffered CoScheduled (BCS) MPI is a novel implementation of MPI based on global synchronization of all system activities. BCS-MPI imposes a model where all processes and their communication are tightly scheduled at a very fine granularity. Thus, BCS-MPI pr...
 
A Performance and Scalability Analysis of the BlueGene/L Architecture
Found in: SC Conference
By Kei Davis, Adolfy Hoisie, Greg Johnson, Darren J. Kerbyson, Mike Lang, Scott Pakin, Fabrizio Petrini
Issue Date:November 2004
pp. 41
<p>Based on a set of measurements done on the 512-node 500MHz prototype and early results on a 2048 node 700MHz BlueGene/L machine at IBM Watson, we present a performance and scalability analysis of the architecture from low-level characteristics to ...
 
Architectural Support for System Software on Large-Scale Clusters
Found in: Parallel Processing, International Conference on
By Juan Fernández, Eitan Frachtenberg, Fabrizio Petrini
Issue Date:August 2004
pp. 519-528
Scalable management of distributed resources is one of the major challenges in deployment of large-scale clusters. Management includes transparent fault tolerance, efficient allocation of resources, and support for all the needs of parallel computing: para...
 
Scalable NIC-based Reduction on Large-scale Clusters
Found in: SC Conference
By Adam Moody, Juan Fernandez, Fabrizio Petrini, Dhabaleswar K. Panda
Issue Date:November 2003
pp. 59
Many parallel algorithms require efficient reduction collectives. In response, researchers have designed algorithms considering a range of parameters including data size, system size, and communication characteristics. Throughout this past work, however, p...
 
Scalable Collective Communication on the ASCI Q Machine
Found in: High-Performance Interconnects, Symposium on
By Fabrizio Petrini, Juan Fernandez, Eitan Frachtenberg, Salvador Coll
Issue Date:August 2003
pp. 54
<p>Scientific codes spend a considerable part of their run time executing collective communication operations. Such operations can also be critical for efficient resource management in large-scale machines. Therefore, scalable collective communicatio...
 
Flexible CoScheduling: Mitigating Load Imbalance and Improving Utilization of Heterogeneous Resources
Found in: Parallel and Distributed Processing Symposium, International
By Eitan Frachtenberg, Dror G. Feitelson, Fabrizio Petrini, Juan Fernandez
Issue Date:April 2003
pp. 85b
Fine-grained parallel applications require all their processes to run simultaneously on distinct processors to achieve good efficiency. This is typically accomplished by space slicing, wherein nodes are dedicated for the duration of the run, or by gang sch...
 
Hardware- and Software-Based Collective Communication on the Quadrics Network
Found in: Network Computing and Applications, IEEE International Symposium on
By Fabrizio Petrini, Salvador Coll, Eitan Frachtenberg, Adolfy Hoisie
Issue Date:October 2001
pp. 0024
The efficient implementation of collective communication patterns in a parallel machine is a challenging design effort, that requires the solution of many problems. In this paper we present an in-depth description of how the Quadrics network supports both ...
 
Gang Scheduling with Lightweight User-Level Communication
Found in: Parallel Processing Workshops, International Conference on
By Eitan Frachtenberg, Fabrizio Petrini, Salvador Coll, Wu-chun Feng
Issue Date:September 2001
pp. 0339
Abstract: In this paper, we explore the performance of gang scheduling on a cluster using the Quadrics interconnection network. In such a cluster, the scheduler can take advantage of this network's unique capabilities, including a network interface card-ba...
 
The Quadrics Network (QsNet): High-Performance Clustering Technology
Found in: High-Performance Interconnects, Symposium on
By Fabrizio Petrini, Wu-chun Feng, Adolfy Hoisie, Salvador Coll, Eitan Frachtenberg
Issue Date:August 2001
pp. 0125
Abstract: The Quadrics interconnection network (QsNet) contributes two novel innovations to the field of high-performance interconnects: (1) integration of the virtual-address spaces of individual nodes into a single, global, virtual-address space and (2) ...
 
Performance Evaluation of the Quadrics Interconnection Network
Found in: Parallel and Distributed Processing Symposium, International
By Fabrizio Petrini, Adolfy Hoisie, Wu-chun Feng, Richard Graham
Issue Date:April 2001
pp. 30165b
We present an initial performance evaluation of the Quadrics interconnection network (QsNET). We describe the main hardware and software features of QsNET of relevance to the system designer and to the end user. Actual benchmarks are performed on an experi...
 
A General Predictive Performance Model for Wavefront Algorithms on Clusters of SMPs
Found in: Parallel Processing, International Conference on
By Adolfy Hoisie, Olaf Lubeck, Harvey Wasserman, Fabrizio Petrini, Hank Alme
Issue Date:August 2000
pp. 219
We propose and validate a closed-end, analytical, general, predictive performance model for applications based on wavefront algorithms on clusters of SMPs. Wavefront algorithms are ubiquitous in parallel computing, since they represent a means of enabling ...
 
Buffered Coscheduling: A New Methodology for Multitasking Parallel Jobs on Distributed Systems
Found in: Parallel and Distributed Processing Symposium, International
By Fabrizio Petrini, Wu-Chun Feng
Issue Date:May 2000
pp. 439
Buffered coscheduling is a scheduling methodology for time-sharing communicating processes in parallel and distributed systems. The methodology has two primary features: communication buffering and strobing. With communication buffering, communication gene...
 
Scheduling with Global Information in Distributed Systems
Found in: Distributed Computing Systems, International Conference on
By Fabrizio Petrini, Wu-chun Feng
Issue Date:April 2000
pp. 225
Buffered co-scheduling is a distributed scheduling methodology for time-sharing communicating processes in a distributed system, e.g., PC cluster. The principle mechanisms involved in this methodology are communication buffering and strobing. With communic...
 
A Throughput-Optimized Optical Network for Data-Intensive Computing
Found in: IEEE Micro
By Laurent Schares,Benjamin G. Lee,Fabio Checconi,Russell Budd,Alexander Rylyakov,Nicolas Dupuis,Fabrizio Petrini,Clint L. Schow,Pablo Fuentes,Oliver Mattes,Cyriel Minkenberg
Issue Date:September 2014
pp. 52-63
Data-intensive computing increasingly involves operations at the scale of an entire computing system, requiring quick and efficient processing of massive datasets. In this article, the authors present a circuit-switched network architecture, together with ...
 
Breaking the speed and scalability Barriers for Graph exploration on distributed-memory machines
Found in: 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
By Fabio Checconi,Fabrizio Petrini,Jeremiah Willcock,Andrew Lumsdaine,Anamitra Roy Choudhury,Yogish Sabharwal
Issue Date:November 2012
pp. 1-12
In this paper, we describe the challenges involved in designing a family of highly-efficient Breadth-First Search (BFS) algorithms and in optimizing these algorithms on the latest two generations of Blue Gene machines, Blue Gene/P and Blue Gene/Q. With our...
 
Looking under the hood of the IBM Blue Gene/Q network
Found in: 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
By Dong Chen,Noel Eisley,Philip Heidelberger,Sameer Kumar,Amith Mamidala,Fabrizio Petrini,Robert Senger,Yutaka Sugawara,Robert Walkup,Burkhard Steinmacher-Burow,Anamitra Choudhury,Yogish Sabharwal,Swati Singhal,Jeffrey J. Parker
Issue Date:November 2012
pp. 1-12
This paper explores the performance and optimization of the IBM Blue Gene/Q (BG/Q) five dimensional torus network on up to 16K nodes. The BG/Q hardware supports multiple dynamic routing algorithms and different traffic patterns may require different algori...
 
Top Picks from Hot Interconnects 2011: Petascale Network Architectures
Found in: IEEE Micro
By Torsten Hoefler,Patrick Geoffray,Fabrizio Petrini,Jesper Larsson Träff
Issue Date:January 2012
pp. 4-7
This introduction to the special issue discusses developments in the area of hot interconnects, specifically the three articles chosen from the 2011 Hot Interconnects conference.
 
Characterization of the Communication Patterns of Scientific Applications on Blue Gene/P
Found in: Parallel and Distributed Processing Workshops and PhD Forum, 2011 IEEE International Symposium on
By Pier Giorgio Raponi,Fabrizio Petrini,Robert Walkup,Fabio Checconi
Issue Date:May 2011
pp. 1017-1024
This paper examines the communication characteristics of a collection of scientific applications selected from the LLNL's Sequoia suite of benchmarks and the ANL's workload. By using an instrumentation library built on top of MPI we collect and characteriz...
 
Intra-Socket and Inter-Socket Communication in Multi-core Systems
Found in: IEEE Computer Architecture Letters
By Fabrizio Petrini, Virat Agarwal, Davide Pasetto
Issue Date:January 2010
pp. 13-16
The increasing computational and communication demands of the scientific and industrial communities require a clear understanding of the performance trade-offs involved in multi-core computing platforms. Such analysis can help application and toolkit devel...
 
Tools for Very Fast Regular Expression Matching
Found in: Computer
By Davide Pasetto, Fabrizio Petrini, Virat Agarwal
Issue Date:March 2010
pp. 50-58
No summary available.
 
SCAMPI: a scalable CAM-based algorithm for multiple pattern inspection
Found in: SC Conference
By Fabrizio Petrini, Virat Agarwal, Davide Pasetto
Issue Date:November 2009
pp. 1-11
String matching is one of the most compute intensive steps in a network intrusion detection system. The growing network rates, rapidly approaching 10 Gbits/sec, and the large number of signatures that need to be scanned concurrently pose very demanding cha...
 
Efficient and Scalable Hardware-Based Multicast in Fat-Tree Networks
Found in: IEEE Transactions on Parallel and Distributed Systems
By Salvador Coll, Francisco J. Mora, Jose Duato, Fabrizio Petrini
Issue Date:September 2009
pp. 1285-1298
This article presents an efficient and scalable mechanism to overcome the limitations of collective communication in switched interconnection networks in the presence of faults. Considering that current trends in supercomputing are moving toward massively ...
 
High-speed string searching against large dictionaries on the Cell/B.E. Processor
Found in: Parallel and Distributed Processing Symposium, International
By Daniele Paolo Scarpazza, Oreste Villa, Fabrizio Petrini
Issue Date:April 2008
pp. 1-12
Our digital universe is growing, creating exploding amounts of data which need to be searched, filtered and protected. String searching is at the core of the tools we use to curb this explosion, such as search engines, network intrusion detection systems, ...
 
Accelerating Real-Time String Searching with Multicore Processors
Found in: Computer
By Oreste Villa, Daniele Paolo Scarpazza, Fabrizio Petrini
Issue Date:April 2008
pp. 42-50
String searching is at the core of tools used to search, filter, and protect data, but this has become increasingly difficult to do in real time as communication speed grows. The authors present an optimization strategy for a popular algorithm that fully e...
 
Efficient Breadth-First Search on the Cell/BE Processor
Found in: IEEE Transactions on Parallel and Distributed Systems
By Daniele Paolo Scarpazza, Oreste Villa, Fabrizio Petrini
Issue Date:October 2008
pp. 1381-1395
Multi-core processors are a shift of paradigm in computer architecture that promises a dramatic increase in performance. But they also bring an unprecedented level of complexity in algorithmic design and software development. In this paper we describe the ...
 
Multicore Surprises: Lessons Learned from Optimizing Sweep3D on the Cell Broadband Engine
Found in: Parallel and Distributed Processing Symposium, International
By Fabrizio Petrini, Gordon Fossum, Juan Fernandez, Ana Lucia Varbanescu, Mike Kistler, Michael Perrone
Issue Date:March 2007
pp. 62
The Cell Broadband Engine (BE) processor provides the potential to achieve an impressive level of performance for scientific applications. This level of performance can be reached by exploiting several dimensions of parallelism, such as thread-level parall...
 
Challenges in Mapping Graph Exploration Algorithms on Advanced Multi-core Processors
Found in: Parallel and Distributed Processing Symposium, International
By Oreste Villa, Daniele Paolo Scarpazza, Fabrizio Petrini, Juan Fernandez Peinador
Issue Date:March 2007
pp. 63
Multi-core processors are a shift of paradigm in computer architecture that promises a dramatic increase in performance. But multi-core processors also bring an unprecedented level of complexity in algorithmic design and software development. In this paper...
 
Peak-Performance DFA-based String Matching on the Cell Processor
Found in: Parallel and Distributed Processing Symposium, International
By Daniele Paolo Scarpazza, Oreste Villa, Fabrizio Petrini
Issue Date:March 2007
pp. 444
The security of your data and of your network is in the hands of intrusion detection systems, virus scanners and spam filters, which are all critically based on string matching. But network links are getting faster and faster, and string matching is gettin...
 
STORM: Scalable Resource Management for Large-Scale Parallel Computers
Found in: IEEE Transactions on Computers
By Eitan Frachtenberg, Fabrizio Petrini, Juan Fernández, Scott Pakin
Issue Date:December 2006
pp. 1572-1587
Although clusters are a popular form of high-performance computing, they remain more difficult to manage than sequential systems—or even symmetric multiprocessors. In this paper, we identify a small set of primitive mechanisms that are sufficiently general...
 
A Locality-Aware Cooperative Cache Management Protocol to Improve Network File System Performance
Found in: Distributed Computing Systems, International Conference on
By Song Jiang, Fabrizio Petrini, Xiaoning Ding, Xiaodong Zhang
Issue Date:July 2006
pp. 42
In a distributed environment the utilization of file buffer caches in different clients may vary greatly. Cooperative caching is used to increase cache utilization by coordinating the usage of distributed caches. Existing cooperative caching protocols main...
 
Guest Editors' Introduction: High-Performance Interconnects
Found in: IEEE Micro
By Fabrizio Petrini, Olav Lysne, Ron Brightwell
Issue Date:May 2006
pp. 7-9
We are pleased to introduce this special issue of IEEE Micro, featuring articles that capture the latest results on high-performance interconnection networks, including some of the best presentations from last summer's Hot Interconnects 13 at Stanford Univ...
 
Adaptive Parallel Job Scheduling with Flexible Coscheduling
Found in: IEEE Transactions on Parallel and Distributed Systems
By Eitan Frachtenberg, Dror G. Feitelson, Fabrizio Petrini, Juan Fernández
Issue Date:November 2005
pp. 1066-1077
<p><b>Abstract</b>—Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because of their synchronization needs, these applications can suffe...
 
EtherNET vs. EtherNOT
Found in: High-Performance Interconnects, Symposium on
By Fabrizio Petrini
Issue Date:August 2005
pp. 117
No summary available.
   
QsNetII: Defining High-Performance Network Design
Found in: IEEE Micro
By Jon Beecroft, David Addison, David Hewson, Moray McLaren, Duncan Roweth, Fabrizio Petrini, Jarek Nieplocha
Issue Date:July 2005
pp. 34-47
QsNetII optimizes interprocessor communication in systems built from standard server building blocks. Its short-message processing unit permits fast injection of small messages, providing ultra-low latency and scalability to thousands of nodes.
 
System-Level Fault-Tolerance in Large-Scale Parallel Machines with Buffered Coscheduling
Found in: Parallel and Distributed Processing Symposium, International
By Fabrizio Petrini, Kei Davis, José Carlos Sancho
Issue Date:April 2004
pp. 209b
As the number of processors for multi-teraflop systems grows to tens of thousands, with proposed petaflops systems likely to contain hundreds of thousands of processors, the assumption of fully reliable hardware has been abandoned. Although the mean time b...
 
On the Feasibility of Incremental Checkpointing for Scientific Computing
Found in: Parallel and Distributed Processing Symposium, International
By José Carlos Sancho, Fabrizio Petrini, Greg Johnson, Juan Fernández, Eitan Frachtenberg
Issue Date:April 2004
pp. 58b
In the near future large-scale parallel computers will feature hundreds of thousands of processing nodes. In such systems, fault tolerance is critical as failures will occur very often. Checkpointing and rollback recovery has been extensively studied as an...
 
 1  2 Next >>