Search For:

Displaying 1-50 out of 54 total
Programming Models for High-Performance Computing
Found in: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)
By Marc Snir
Issue Date:May 2013
pp. 1
The first version of the MPI standard was released in November 1993. At the time, many of the authors of this standard, myself included, viewed MPI as a temporary solution, to be used until it is replaced by a good programming language for distributed memo...
 
Ubiquitous Parallel Computing from Berkeley, Illinois, and Stanford
Found in: IEEE Micro
By Bryan Catanzaro, Armando Fox, Kurt Keutzer, David Patterson, Bor-Yiing Su, Marc Snir, Kunle Olukotun, Pat Hanrahan, Hassan Chafi
Issue Date:March 2010
pp. 41-55
<p>The ParLab at Berkeley, UPCRC-Illinois, and the Pervasive Parallel Laboratory at Stanford are studying how to make parallel programming succeed given industry's recent shift to multicore computing. All three centers assume that future microprocess...
 
Programming for Exascale Computers
Found in: Computing in Science & Engineering
By William Gropp,Marc Snir
Issue Date:November 2013
pp. 27-35
Exascale systems will present programmers with many challenges. The authors review the parallel programming models that are appropriate for such systems and the challenges that implementations of those models face in an exascale system. They also discuss t...
 
Fault prediction under the microscope: A closer look into HPC systems
Found in: 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
By Ana Gainaru,Franck Cappello,Marc Snir,William Kramer
Issue Date:November 2012
pp. 1-11
A large percentage of computing capacity in today's large high-performance computing systems is wasted because of failures. Consequently current research is focusing on providing fault tolerance strategies that aim to minimize fault's effects on applicatio...
 
Damaris: How to Efficiently Leverage Multicore Parallelism to Achieve Scalable, Jitter-free I/O
Found in: 2012 IEEE International Conference on Cluster Computing (CLUSTER)
By Matthieu Dorier,Gabriel Antoniu,Franck Cappello,Marc Snir,Leigh Orf
Issue Date:September 2012
pp. 155-163
With exascale computing on the horizon, the performance variability of I/O systems represents a key challenge in sustaining high performance. In many HPC applications, I/O is concurrently performed by all processes, which leads to I/O bursts. This causes r...
 
HydEE: Failure Containment without Event Logging for Large Scale Send-Deterministic MPI Applications
Found in: 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
By Amina Guermouche,Thomas Ropars,Marc Snir,Franck Cappello
Issue Date:May 2012
pp. 1216-1227
High performance computing will probably reach exascale in this decade. At this scale, mean time between failures is expected to be a few hours. Existing fault tolerant protocols for message passing applications will not be efficient anymore since they eit...
 
Comparing archival policies for Blue Waters
Found in: High-Performance Computing, International Conference on
By Franck Cappello,Mathias Jacquelin,Loris Marchal,Yves Robert,Marc Snir
Issue Date:December 2011
pp. 1-10
This paper introduces two new tape archival policies that can improve tape archive performance in certain regimes, compared to the classical RAIT (Redundant Array of Independent Tapes) policy. The first policy, PARALLEL, still requires as many parallel tap...
 
Optimizing the Barnes-Hut algorithm in UPC
Found in: SC Conference
By Junchao Zhang,Babak Behzad,Marc Snir
Issue Date:November 2011
pp. 1-11
PGAS languages' support of a global name space facilitates the expression of parallel algorithms, since communication is implicit. This is especially convenient when writing irregular applications with data-dependent, dynamically changing communication pat...
 
Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications
Found in: Parallel and Distributed Processing Symposium, International
By Amina Guermouche,Thomas Ropars,Elisabeth Brunet,Marc Snir,Franck Cappello
Issue Date:May 2011
pp. 989-1000
As reported by many recent studies, the mean time between failures of future post-petascale supercomputers is likely to reduce, compared to the current situation. The most popular fault tolerance approach for MPI applications on HPC Platforms relies on coo...
 
ESoftCheck: Removal of Non-vital Checks for Fault Tolerance
Found in: Code Generation and Optimization, IEEE/ACM International Symposium on
By Jing Yu, Maria Jesus Garzaran, Marc Snir
Issue Date:March 2009
pp. 35-46
As semiconductor technology scales into the deep submicron regime the occurrence of transient or soft errors will increase. This will require new approaches to error detection. Software checking approaches are attractive because they require little hardwar...
 
Efficient software checking for fault tolerance
Found in: Parallel and Distributed Processing Symposium, International
By Jing Yu, Maria Jesus Garzaran, Marc Snir
Issue Date:April 2008
pp. 1-5
Dramatic increases in the number of transistors that can be integrated on a chip make processors more susceptible to radiation-induced transient errors. For commodity chips which are cost- and energy-constrained, software approaches can play a major role f...
 
Programming Patterns for Architecture-Level Software Optimizations on Frequent Pattern Mining
Found in: Data Engineering, International Conference on
By Mingliang Wei, Changhao Jiang, Marc Snir
Issue Date:April 2007
pp. 336-345
One very important application in the data mining domain is frequent pattern mining. Various authors have worked on improving the efficiency of this computation, mostly focusing on algorithm-level improvement. More recent work has explored architecture spe...
 
Automatic Tuning Matrix Multiplication Performance on Graphics Hardware
Found in: Parallel Architectures and Compilation Techniques, International Conference on
By Changhao Jiang, Marc Snir
Issue Date:September 2005
pp. 185-196
<p>In order to utilize the tremendous computing power of grpahics hardware and to automatically adapt to the fast and frequent changes in its architecture and performance characteristics, this paper implements an automatic tuning system to generate h...
 
Generalized Communicators in the Message Passing Interface
Found in: IEEE Transactions on Parallel and Distributed Systems
By Erik D. Demaine, Ian Foster, Carl Kesselman, Marc Snir
Issue Date:June 2001
pp. 610-616
<p><b>Abstract</b>—We propose extensions to the Message Passing Interface (MPI) that generalize the MPI communicator concept to allow multiple communication endpoints per process, dynamic creation of endpoints, and the transfer of endpoin...
 
From Trace Generation to Visualization: A Performance Framework for Distributed Parallel Systems
Found in: SC Conference
By C. Eric Wu, Anthony Bolmarcich, Marc Snir, David Wootton, Farid Parpia, Anthony Chan, Ewing Lusk, William Gropp
Issue Date:November 2000
pp. 50
In this paper we describe a trace analysis framework, from trace generation to visualization. It includes a unified tracing facility on IBMâ SPä systems, a self-defining interval file format, an API for framework extensions, utilities for merging and stati...
 
Java For Numerically Intensive Computing: From Flops To Gigaflops
Found in: Frontiers of Massively Parallel Processing, Symposium on the
By Samuel P. Midkiff, Jose E. Moreira, Marc Snir
Issue Date:February 1999
pp. 251
Java is not thought of as being competitive with Fortran for numerical programming. In this paper, we discuss technologies that can and will deliver Fortran-like performance in Java. These techniques include new and existing compiler technologies, the expl...
 
Generalized Communicators in the Message Passing Interface
Found in: MPI Developers Conference
By Ian Foste, Carl Kesselman, Marc Snir
Issue Date:July 1996
pp. 0042
We propose extensions to the Message Passing Interface (MPI) that generalize the MPI communicator concept to allow multiple communication endpoints per process, dynamic creation of endpoints, and the transfer of endpoints between processes. The generalized...
 
Randomized Routing with Shorter Paths
Found in: IEEE Transactions on Parallel and Distributed Systems
By Eli Upfal, Sergio Felperin, Marc Snir
Issue Date:April 1996
pp. 356-362
<p><b>Abstract</b>—We study in this paper the use of randomized routing in multistage networks. While log <it>N</it> additional randomizing stages are needed to break
 
Parallel I/O: Getting Ready for Prime Time
Found in: IEEE Concurrency
By Dan Reed, Charles Catlett, Alok Choudhary, David Kotz, Marc Snir
Issue Date:June 1995
pp. 64-71
No summary available.
 
CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers
Found in: IEEE Transactions on Parallel and Distributed Systems
By Vasanth Bala, Jehoshua Bruck, Robert Cypher, Pablo Elustondo, Alex Ho, Ching-Tien Ho, Shlomo Kipnis, Marc Snir
Issue Date:February 1995
pp. 154-164
<p><it>Abstract—</it>A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a conve...
 
MPI-F: An Efficient Implementation of MPI on IBM-SP1
Found in: Parallel Processing, International Conference on
By Hubertus Franke, Peter Hochschild, Pratap Pattnaik, Marc Snir
Issue Date:August 1994
pp. 197-201
This article introduces MPI-F an efficient implementation of MPI on the IBM-SP1 distributed memory cluster. After discussing the novel and key concepts of MPI and how they relate to an implementation, the MPI-F system architecture is outlined in detail. Al...
 
Using Visualization Tools to Understand Concurrency
Found in: IEEE Software
By Dror Zernick, Marc Snir, Dalia Malki
Issue Date:May 1992
pp. 87-92
<p>A visualization tool that provides an aggregate view of execution through a graph of events called the causality graph, which is suitable for systems with hundreds or thousands of processors, coarse-grained parallelism, and for a language that mak...
 
Hierarchical memory with block transfer
Found in: Foundations of Computer Science, Annual IEEE Symposium on
By Alok Aggarwal, Ashok K. Chandra, Marc Snir
Issue Date:October 1987
pp. 204-216
In this paper we introduce a model of Hierarchical Memory with Block Transfer (BT for short). It is like a random access machine, except that access to location x takes time f(x), and a block of consecutive locations can be copied from memory to memory, ta...
 
Design of a Multithreaded Barnes-Hut Algorithm for Multicore Clusters
Found in: IEEE Transactions on Parallel and Distributed Systems
By Junchao Zhang,Babak Behzad,Marc Snir
Issue Date:June 2014
pp. 1
We describe in this paper an implementation of the Barnes-Hut algorithm on multicore clusters. Based on a partitioned global address space (PGAS) library, the design integrates intranode multithreading and internode one-sided communication, exemplifying a ...
 
Programming for Exascale Computers
Found in: Computing in Science & Engineering
By William Gropp,Marc Snir
Issue Date:October 2013
pp. 1
Exascale systems will present programmers with many challenges. We review the parallel programming models that are appropriate for such systems and the challenges that implementations of those models face on an exascale system. We discuss the feasibility o...
 
The power of parallel prefix
Found in: IEEE Transactions on Computers
By Clyde P. Kruskal,Larry Rudolph,Marc Snir
Issue Date:October 1985
pp. 965-968
The prefix computation problem is to compute all n initial products a1 ∘ … ∘ ai, i = 1, …, n, of a set of n elements, where ∘ is an associative operation. We present an 0(((log n)/log(2n/p)) · (n/p)) time deterministic parallel algorithm using p ≤ n proces...
   
Taming parallel I/O complexity with auto-tuning
Found in: Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis (SC '13)
By Joseph Huchette, Ruth Aydt, Surendra Byna, Babak Behzad, Huong Vu Thanh Luu, Marc Snir, Quincey Koziol, Surendra Prabhat
Issue Date:November 2013
pp. 1-12
We present an auto-tuning system for optimizing I/O performance of HDF5 applications and demonstrate its value across platforms, applications, and at scale. The system uses a genetic algorithm to search a large space of tunable parameters and to identify e...
     
Enabling MPI interoperability through flexible communication endpoints
Found in: Proceedings of the 20th European MPI Users' Group Meeting (EuroMPI '13)
By David Goodell, Douglas Miller, James Dinan, Marc Snir, Pavan Balaji, Rajeev Thakur
Issue Date:September 2013
pp. 13-18
The current MPI model defines a one-to-one relationship between MPI processes and MPI ranks. This model captures many use cases effectively, such as one MPI process per core and one MPI process per node. However, this semantic has limited interoperability ...
     
Programming models for extreme-scale computing
Found in: Proceedings of the 2013 ACM symposium on Principles of distributed computing (PODC '13)
By Marc Snir
Issue Date:July 2013
pp. 3-3
The first version of the MPI standard was released in November 1993. At the time, many of the authors of this standard, myself included, viewed MPI as a temporary solution, to be used until it is replaced by a good programming language for distributed memo...
     
NUMA-aware shared-memory collective communication for MPI
Found in: Proceedings of the 22nd international symposium on High-performance parallel and distributed computing (HPDC '13)
By Marc Snir, Shigang Li, Torsten Hoefler
Issue Date:June 2013
pp. 85-96
As the number of cores per node keeps increasing, it becomes increasingly important for MPI to leverage shared memory for intranode communication. This paper investigates the design and optimizations of MPI collectives for clusters of NUMA nodes. We develo...
     
Automatic datatype generation and optimization
Found in: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP '12)
By Torsten Hoefler, Fredrik Kjolstad, Marc Snir
Issue Date:February 2012
pp. 327-328
Many high performance applications spend considerable time packing noncontiguous data into contiguous communication buffers. MPI Datatypes provide an alternative by describing noncontiguous data layouts. This allows sophisticated hardware to retrieve data ...
     
Performance modeling for systematic performance tuning
Found in: State of the Practice Reports (SC '11)
By Marc Snir, Torsten Hoefler, William Gropp, William Kramer
Issue Date:November 2011
pp. 1-12
The performance of parallel scientific applications depends on many factors which are determined by the execution environment and the parallel application. Especially on large parallel systems, it is too expensive to explore the solution space with series ...
     
Performance engineering: a must for petascale and beyond
Found in: Proceedings of the third international workshop on Large-scale system and application performance (LSAP '11)
By Marc Snir, Torsten Hoefler
Issue Date:June 2011
pp. 1-2
We discuss the need for a more principled approach to the management of the performance of applications for petascale platforms and outline some initial successes, related to the Blue Waters project.
     
Generic topology mapping strategies for large-scale parallel architectures
Found in: Proceedings of the international conference on Supercomputing (ICS '11)
By Marc Snir, Torsten Hoefler
Issue Date:May 2011
pp. 75-84
The steadily increasing number of nodes in high-performance computing systems and the technology and power constraints lead to sparse network topologies. Efficient mapping of application communication patterns to the network topology gains importance as sy...
     
Transformation for class immutability
Found in: Proceeding of the 33rd international conference on Software engineering (ICSE '11)
By Danny Dig, Fredrik Kjolstad, Gabriel Acevedo, Marc Snir
Issue Date:May 2011
pp. 61-70
It is common for object-oriented programs to have both mutable and immutable classes. Immutable classes simplify programing because the programmer does not have to reason about side-effects. Sometimes programmers write immutable classes from scratch, other...
     
Computer and information science and engineering: one discipline, many specialties
Found in: Communications of the ACM
By Marc Snir
Issue Date:March 2011
pp. 38-43
Mathematics is no longer the only foundation for computing and information research and education in academia.
     
Advice to members seeking ACM distinction
Found in: Communications of the ACM
By Marc Snir, Telle Whitney, Telle Whitney, Telle Whitney
Issue Date:July 2010
pp. 40-41
Reflections on the (experimental) scientific method in computer science.
     
Ghost Cell Pattern
Found in: Proceedings of the 2010 Workshop on Parallel Programming Patterns (ParaPLoP '10)
By Fredrik Berg Kjolstad, Marc Snir
Issue Date:March 2010
pp. 1-9
Many problems consist of a structured grid of points that are updated repeatedly based on the values of a fixed set of neighboring points in the same grid. To parallelize these problems we can geometrically divide the grid into chunks that are processed by...
     
Shared memory programming on distributed memory systems
Found in: Proceedings of the Third Conference on Partitioned Global Address Space Programing Models (PGAS '09)
By Marc Snir
Issue Date:October 2009
pp. 1-1
From the literature, it is known that backward polygon beam tracing and other light volume methods are well suited to gather path coherency from specular scattering surfaces. This is of course useful for modelling and efficiently simulating caustics (LS+DE...
     
The NYU ultracomputer---designing a MIMD, shared-memory parallel machine
Found in: 25 years of the international symposia on Computer architecture (selected papers) (ISCA '98)
By Allan Gottlieb, Clyde P. Kruskal, Kevin P. McAuliffe, Larry Rudolph, Marc Snir, Ralph Grishman
Issue Date:June 1998
pp. 239-254
DataScalar architectures improve memory system performance by running computation redundantly across multiple processors, which are each tightly coupled with an associated memory. The program data set (and/or text) is distributed across these memories. In ...
     
Issues and directions in scalable parallel computing
Found in: Proceedings of the twelfth annual ACM symposium on Principles of distributed computing (PODC '93)
By Marc Snir
Issue Date:August 1993
pp. 21-28
In [4] a randomized algorithm for mutual exclusion with bounded waiting, employing a logarithmic sized shared variable, was given. Saias and Lynch [5] pointed out that the adversary scheduler postulated in the above paper can observe the behavior of proces...
     
Randomized routing with shorter paths
Found in: Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures (SPAA '93)
By Eli Upfal, Marc Snir, Sergio Feleprin
Issue Date:June 1993
pp. 283-292
We propose a new design for highly concurrent Internet services, which we call the staged event-driven architecture (SEDA). SEDA is intended to support massive concurrency demands and simplify the construction of well-conditioned services. In SEDA, applica...
     
Scalable parallel computing: the IBM 9076 scalable POWERparallel 1
Found in: Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures (SPAA '93)
By Marc Snir
Issue Date:June 1993
pp. 42
We propose a new design for highly concurrent Internet services, which we call the staged event-driven architecture (SEDA). SEDA is intended to support massive concurrency demands and simplify the construction of well-conditioned services. In SEDA, applica...
     
Computer architectures and programming models for scalable parallel computing (abstract)
Found in: Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages (POPL '93)
By Marc Snir
Issue Date:March 1993
pp. 1
We describe a method for using abstraction to reduce the complexity of temporal logic model checking. The basis of this method is a way of constructing an abstract model of a program without ever examining the corresponding unabstracted model. We show how ...
     
Efficient synchronization of multiprocessors with shared memory
Found in: ACM Transactions on Programming Languages and Systems (TOPLAS)
By Clyde P. Kruskal, Larry Rudolph, Marc Snir
Issue Date:January 1988
pp. 579-601
A new formalism is given for read-modify-write (RMW) synchronization operations. This formalism is used to extend the memory reference combining mechanism introduced in the NYU Ultracomputer, to arbitrary RMW operations. A formal correctness proof of this ...
     
Computing on an anonymous ring
Found in: Journal of the ACM (JACM)
By Hagit Attiya, Manfred K. Warmuth, Marc Snir
Issue Date:January 1988
pp. 845-875
The computational capabilities of a system of n indistinguishable (anonymous) processors arranged on a ring in the synchronous and asynchronous models of distributed computation are analyzed. A precise characterization of the functions that can be computed...
     
Efficient and correct execution of parallel programs that share memory
Found in: ACM Transactions on Programming Languages and Systems (TOPLAS)
By Dennis Shasha, Marc Snir
Issue Date:January 1988
pp. 282-312
In this paper we consider an optimization problem that arises in the execution of parallel programs on shared-memory multiple-instruction-stream, multiple-data-stream (MIMD) computers. A program on such machines consists of many sequential program segments...
     
Random walks on weighted graphs and applications to on-line algorithms
Found in: Journal of the ACM (JACM)
By Don Coppersmith, Marc Snir, Peter Doyle, Prabhakar Raghavan
Issue Date:January 1988
pp. 421-453
In a priority-based computer system, besides the regular jobs, an additional job (refereed to as job A) is invoked infrequently but requires a significant amount of CPU time. To avoid CPU hogging, job A receives (up to) a fixed amount of CPU time whenever ...
     
A message passing standard for MPP and workstations
Found in: Communications of the ACM
By David Walker, Jack J. Dongarra, Marc Snir, Steve W. Otto
Issue Date:January 1988
pp. 84-90
The online Risks Forum has long been a hotbed for discussions of the relative merits of openness relating to the dissemination of knowledge about security vulnerabilities. The debate has now been rekindled, and is summarized here.
     
Efficient synchronization of multiprocessors with shared memory
Found in: Proceedings of the fifth annual ACM symposium on Principles of distributed computing (PODC '86)
By Clyde P Kruskal, Larry Rudolph, Marc Snir
Issue Date:August 1986
pp. 218-228
An efficient distributed algorithm to detect deadlocks in distributed and dynamically changing systems is presented. In our model, processes can request any N available resources from a pool of size M. This is a generalization of the well-known AND-OR requ...
     
 1  2 Next >>