Search For:

Displaying 1-22 out of 22 total
High-efficiency server design
Found in: SC Conference
By Eitan Frachtenberg,Ali Heydari,Harry Li,Amir Michael,Jacob Na,Avery Nisbet,Pierluigi Sarti
Issue Date:November 2011
pp. 1-27
Large-scale data centers consume megawatts in power and cost hundreds of millions of dollars to equip. Reducing the energy and cost footprint of servers can therefore have substantial impact. Web, Grid, and cloud servers in particular can be hard to optimi...
 
BCS-MPI: A New Approach in the System Software Design for Large-Scale Parallel Computers
Found in: SC Conference
By Juan Fernández, Eitan Frachtenberg, Fabrizio Petrini
Issue Date:November 2003
pp. 57
Buffered CoScheduled MPI (BCS-MPI) introduces a new approach to design the communication layer for large-scale parallel machines. The emphasis of BCS-MPI is on the global coordination of a large number of communicating processes rather than on the traditio...
 
STORM: Lightning-Fast Resource Management
Found in: SC Conference
By Eitan Frachtenberg, Fabrizio Petrini, Juan Fernandez, Scott Pakin, Salvador Coll
Issue Date:November 2002
pp. 46
Although workstation clusters are a common platform for high-performance computing (HPC), they remain more difficult to manage than sequential systems or even symmetric multiprocessors. Furthermore, as cluster sizes increase, the quality of the resource-ma...
 
Scalable Resource Management in High Performance Computers
Found in: Cluster Computing, IEEE International Conference on
By Eitan Frachtenberg, Fabrizio Petrini, Juan Fernandez, Salvador Coll
Issue Date:September 2002
pp. 305
<p>Clusters of workstations have emerged as an important platform for building cost-effective, scalable, and highly-available computers. Although many hardware solutions are available today, the largest challenge in making large-scale clusters usable...
 
Performance Evaluation of I/O Traffic and Placement of I/O Nodes on a High Performance Network
Found in: Parallel and Distributed Processing Symposium, International
By Salvador Coll, Fabrizio Petrini, Eitan Frachtenberg, Adolfy Hoisie
Issue Date:April 2002
pp. 0165
A common trend in the design of large-scale clusters is to use a high-performance data network to integrate the processing nodes in a single parallel computer. In these systems the performance of the interconnect can be a limiting factor for the input/outp...
 
Characterizing Facebook's Memcached Workload
Found in: IEEE Internet Computing
By Yuehai Xu,Eitan Frachtenberg,Song Jiang,Mike Paleczny
Issue Date:March 2014
pp. 41-49
Memcached is one of the world's largest key-value deployments. This article analyzes the Memcached workload at Facebook, looking at server-side performance, request composition, caching efficacy, and key locality. The observations presented here lead to se...
 
Development and Deployment at Facebook
Found in: IEEE Internet Computing
By Dror G. Feitelson,Eitan Frachtenberg,Kent L. Beck
Issue Date:July 2013
pp. 8-17
Internet companies such as Facebook operate in a "perpetual development" mindset. This means that the website continues to undergo development with no predefined final objective, and that new developments are deployed so that users can enjoy them...
 
Holistic Datacenter Design in the Open Compute Project
Found in: Computer
By Eitan Frachtenberg
Issue Date:July 2012
pp. 83-85
Facebook's Open Compute Project lets the community benefit from and contribute to improvements in power and water usage effectiveness, cost, and operation.
 
JSSPP Workshop Introduction
Found in: 2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
By Walfredo Cirne,Narayan Desai,Eitan Frachtenberg,Uwe Schwiegelshohn
Issue Date:May 2012
pp. 2271
No summary available.
 
STORM: Scalable Resource Management for Large-Scale Parallel Computers
Found in: IEEE Transactions on Computers
By Eitan Frachtenberg, Fabrizio Petrini, Juan Fernández, Scott Pakin
Issue Date:December 2006
pp. 1572-1587
Although clusters are a popular form of high-performance computing, they remain more difficult to manage than sequential systems—or even symmetric multiprocessors. In this paper, we identify a small set of primitive mechanisms that are sufficiently general...
 
Adaptive Parallel Job Scheduling with Flexible Coscheduling
Found in: IEEE Transactions on Parallel and Distributed Systems
By Eitan Frachtenberg, Dror G. Feitelson, Fabrizio Petrini, Juan Fernández
Issue Date:November 2005
pp. 1066-1077
<p><b>Abstract</b>—Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because of their synchronization needs, these applications can suffe...
 
Monitoring and Debugging Parallel Software with BCS-MPI on Large-Scale Clusters
Found in: Parallel and Distributed Processing Symposium, International
By Juan Fernández, Fabrizio Petrini, Eitan Frachtenberg
Issue Date:April 2005
pp. 300a
Buffered CoScheduled (BCS) MPI is a novel implementation of MPI based on global synchronization of all system activities. BCS-MPI imposes a model where all processes and their communication are tightly scheduled at a very fine granularity. Thus, BCS-MPI pr...
 
Architectural Support for System Software on Large-Scale Clusters
Found in: Parallel Processing, International Conference on
By Juan Fernández, Eitan Frachtenberg, Fabrizio Petrini
Issue Date:August 2004
pp. 519-528
Scalable management of distributed resources is one of the major challenges in deployment of large-scale clusters. Management includes transparent fault tolerance, efficient allocation of resources, and support for all the needs of parallel computing: para...
 
On the Feasibility of Incremental Checkpointing for Scientific Computing
Found in: Parallel and Distributed Processing Symposium, International
By José Carlos Sancho, Fabrizio Petrini, Greg Johnson, Juan Fernández, Eitan Frachtenberg
Issue Date:April 2004
pp. 58b
In the near future large-scale parallel computers will feature hundreds of thousands of processing nodes. In such systems, fault tolerance is critical as failures will occur very often. Checkpointing and rollback recovery has been extensively studied as an...
 
Scalable Collective Communication on the ASCI Q Machine
Found in: High-Performance Interconnects, Symposium on
By Fabrizio Petrini, Juan Fernandez, Eitan Frachtenberg, Salvador Coll
Issue Date:August 2003
pp. 54
<p>Scientific codes spend a considerable part of their run time executing collective communication operations. Such operations can also be critical for efficient resource management in large-scale machines. Therefore, scalable collective communicatio...
 
Flexible CoScheduling: Mitigating Load Imbalance and Improving Utilization of Heterogeneous Resources
Found in: Parallel and Distributed Processing Symposium, International
By Eitan Frachtenberg, Dror G. Feitelson, Fabrizio Petrini, Juan Fernandez
Issue Date:April 2003
pp. 85b
Fine-grained parallel applications require all their processes to run simultaneously on distinct processors to achieve good efficiency. This is typically accomplished by space slicing, wherein nodes are dedicated for the duration of the run, or by gang sch...
 
The Quadrics Network: High-Performance Clustering Technology
Found in: IEEE Micro
By Fabrizio Petrini, Wu-chun Feng, Adolfy Hoisie, Salvador Coll, Eitan Frachtenberg
Issue Date:January 2002
pp. 46-57
<p>The Quadrics network extends the native operating system in processing nodes with a network operating system and specialized hardware support in the network interface. Doing so integrates individual node's address spaces into a single, global, vir...
 
Hardware- and Software-Based Collective Communication on the Quadrics Network
Found in: Network Computing and Applications, IEEE International Symposium on
By Fabrizio Petrini, Salvador Coll, Eitan Frachtenberg, Adolfy Hoisie
Issue Date:October 2001
pp. 0024
The efficient implementation of collective communication patterns in a parallel machine is a challenging design effort, that requires the solution of many problems. In this paper we present an in-depth description of how the Quadrics network supports both ...
 
Using Multirail Networks in High-Performance Clusters
Found in: Cluster Computing, IEEE International Conference on
By Salvador Coll, Eitan Frachtenberg, Fabrizio Petrini, Adolfy Hoisie, Leonid Gurvits
Issue Date:October 2001
pp. 15
Using multiple independent networks (also known as rails) is an emerging technique to overcome bandwidth limitations and enhance fault tolerance of current high-performance clusters. We present an extensive experimental comparison of the behavior of variou...
 
Gang Scheduling with Lightweight User-Level Communication
Found in: Parallel Processing Workshops, International Conference on
By Eitan Frachtenberg, Fabrizio Petrini, Salvador Coll, Wu-chun Feng
Issue Date:September 2001
pp. 0339
Abstract: In this paper, we explore the performance of gang scheduling on a cluster using the Quadrics interconnection network. In such a cluster, the scheduler can take advantage of this network's unique capabilities, including a network interface card-ba...
 
The Quadrics Network (QsNet): High-Performance Clustering Technology
Found in: High-Performance Interconnects, Symposium on
By Fabrizio Petrini, Wu-chun Feng, Adolfy Hoisie, Salvador Coll, Eitan Frachtenberg
Issue Date:August 2001
pp. 0125
Abstract: The Quadrics interconnection network (QsNet) contributes two novel innovations to the field of high-performance interconnects: (1) integration of the virtual-address spaces of individual nodes into a single, global, virtual-address space and (2) ...
 
Process Scheduling for the Parallel Desktop
Found in: Parallel Architectures, Algorithms, and Networks, International Symposium on
By Eitan Frachtenberg
Issue Date:December 2005
pp. 132-139
No summary available.
 
 1