Search For:

Displaying 1-14 out of 14 total
Avoiding hot-spots on two-level direct networks
Found in: SC Conference
By Abhinav Bhatele,Nikhil Jain,William D. Gropp,Laxmikant V. Kale
Issue Date:November 2011
pp. 1-11
A low-diameter, fast interconnection network is going to be a prerequisite for building exascale machines. A two-level direct network has been proposed by several groups as a scalable design for future machines. IBM's PERCS topology and the dragonfly netwo...
Performance Modeling and Tuning of an Unstructured Mesh CFD Application
Found in: SC Conference
By William D. Gropp, Dinesh K. Kaushik, David E. Keyes, Barry F. Smith
Issue Date:November 2000
pp. 34
This paper describes performance tuning experiences with a three-dimensional unstructured grid Euler flow code from NASA, which we have reimplemented in the PETSc framework and ported to several large-scale machines, including the ASCI Red and Blue Pacific...
Architectural Constraints to Attain 1 Exaflop/s for Three Scientific Application Classes
Found in: Parallel and Distributed Processing Symposium, International
By Abhinav Bhatele,Pritish Jetley,Hormozd Gahvari,Lukasz Wesolowski,William D. Gropp,Laxmikant Kalé
Issue Date:May 2011
pp. 80-91
The first Teraflop/s computer, the ASCI Red, became operational in 1997, and it took more than 11 years for a Petaflop/s performance machine, the IBM Roadrunner, to appear on the Top500 list. Efforts have begun to study the hardware and software challenges...
Weighted locality-sensitive scheduling for mitigating noise on multi-core clusters
Found in: High-Performance Computing, International Conference on
By Vivek Kale,Abhinav Bhatele,William D. Gropp
Issue Date:December 2011
pp. 1-10
Recent studies have shown that operating system (OS) interference, popularly called OS noise can be a significant problem as we scale to a large number of processors. One solution for mitigating noise is to turn off certain OS services on the machine. Howe...
Enabling the Next Generation of Scalable Clusters
Found in: Cluster Computing and the Grid, IEEE International Symposium on
By William D. Gropp
Issue Date:May 2010
pp. 3
No summary available.
Exploring the Relationship Between Parallel Application Run-Time Variability and Network Performance in Clusters
Found in: Local Computer Networks, Annual IEEE Conference on
By Jeffrey J. Evans, Cynthia S. Hood, William D. Gropp
Issue Date:October 2003
pp. 538
Highly variable parallel application execution time is a persistent issue in cluster computing environments, and can be particularly acute in systems composed of Networks of Workstations (NOWs). We are looking at this issue in terms of consistency. In part...
LACIO: A New Collective I/O Strategy for Parallel I/O Systems
Found in: Parallel and Distributed Processing Symposium, International
By Yong Chen,Xian-He Sun,Rajeev Thakur,Philip C. Roth,William D. Gropp
Issue Date:May 2011
pp. 794-804
Parallel applications benefit considerably from the rapid advance of processor architectures and the available massive computational capability, but their performance suffers from large latency of I/O accesses. The poor I/O performance has been attributed ...
Self-Consistent MPI Performance Guidelines
Found in: IEEE Transactions on Parallel and Distributed Systems
By Jesper Larsson Träff, William D. Gropp, Rajeev Thakur
Publication Date: July 2009
pp. 698-709
Message passing using the Message-Passing Interface (MPI) is at present the most widely adopted framework for programming parallel applications for distributed memory and clustered parallel systems. For reasons of (universal) implementability, the MPI stan...
A Decoupled Execution Paradigm for Data-Intensive High-End Computing
Found in: 2012 IEEE International Conference on Cluster Computing (CLUSTER)
By Yong Chen,Chao Chen,Xian-He Sun,William D. Gropp,Rajeev Thakur
Issue Date:September 2012
pp. 200-208
High-end computing (HEC) applications in critical areas of science and technology tend to be more and more data intensive. I/O has become a vital performance bottleneck of modern HEC practice. Conventional HEC execution paradigms, however, are computing-ce...
Hybrid Static/dynamic Scheduling for Already Optimized Dense Matrix Factorization
Found in: 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)
By Simplice Donfack,Laura Grigori,William D. Gropp,Vivek Kale
Issue Date:May 2012
pp. 496-507
We present the use of a hybrid static/dynamic scheduling strategy of the task dependency graph for direct methods used in dense numerical linear algebra. This strategy provides a balance of data locality, load balance, and low dequeue overhead. We show tha...
Software for Petascale Computing Systems
Found in: Computing in Science and Engineering
By William D. Gropp
Issue Date:September 2009
pp. 17-21
<p>Developing software for highly scalable systems with nearly a million processors or cores raises unique challenges. To succeed, application developers must reconsider both their code's structure and the tools they use to develop, tune, and run tha...
Abstract: Slack-Conscious Lightweight Loop Scheduling for Improving Scalability of Bulk-synchronous MPI Applications
Found in: 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC)
By Vivek Kale,Todd Gamblin,Torsten Hoefler,Bronis R. de Supinski,William D. Gropp
Issue Date:November 2012
pp. 1392
Due to the strict communication dependences in the global collective communication of MPI applications, noise that delays one process can amplify across processes in a large run. The amount of overhead that noise amplification causes can increase dramatica...
Performance modeling as the key to extreme scale computing
Found in: Proceedings of the international conference on Supercomputing (ICS '11)
By William D. Gropp
Issue Date:May 2011
pp. 213-213
Parallel computing is primarily about achieving greater performance than is possible without using parallelism. Especially for the high-end, where systems cost tens to hundreds of millions of dollars, making the best use of these valuable and scarce system...
An adaptive performance modeling tool for GPU architectures
Found in: Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel computing (PPoPP '10)
By Matthieu Delahaye, Sanjay J. Patel, Sara S. Baghsorkhi, Wen-mei W. Hwu, William D. Gropp
Issue Date:January 2010
pp. 105-114
This paper presents an analytical model to predict the performance of general-purpose applications on a GPU architecture. The model is designed to provide performance information to an auto-tuning compiler and assist it in narrowing down the search to the ...