Issue No.03 - Fall (1995 vol.3)
In heterogeneous computing environments, computational resources can have a nonuniform distribution that changes over time. To execute in such an environment, many irregular and loosely synchronous data-parallel applications must be carefully mapped. This article examines algorithms that provide this mapping by efficiently partitioning the computational graphs of these applications. Heterogeneity has become commonplace in high-performance computing environments. In the future most computing environments will consist of a cluster of nodes connected by a high-speed interconnection network. Node architectures will include high-performance SIMD and MIMD parallel computers as well as numerous high-performance workstations. In a heterogeneous environment, users can pool many computational resources to create a large virtual machine. This environment can be nonuniform -- that is, the machines or processors can have different computational powers. However, the pool of resources might change over the computation's lifetime because of machine failures or differing use patterns. It should be possible to add or remove resources without significantly affecting the other machines or changing the existing software. In such an adaptive environment, an individual machine could either be dedicated to a single user's computation or shared by users. The former strategy has the advantage that each machine has static computing capability, while the latter has the advantage of a higher rate of use. In this article we'll examine the mapping requirements for the parallelization of a large class of irregular and loosely synchronous data-parallel applications on nonuniform and adaptive environments. The computational structure of these applications can be described as a computational graph. In such a graph, nodes represent computational tasks and edges describe the communication between tasks. For many applications, the graph's vertices correspond to 2D and 3D coordinates, and the interaction between computations is limited to physically proximate vertices. Recursive coordinate bisection, index-based mapping, and recursive spectral bisection can exploit these properties to partition such applications. Essentially, these algorithms cluster proximate points together to form a partition such that the numbers of vertices attached to every partition are equal. Other researchers have used these algorithms to map graphs onto uniform parallel machines. We'll evaluate how the algorithms partition computational graphs on a simulation of a cluster of machines constituting a static, nonuniform environment. (In a static environment, computational resources are fixed throughout the completion of all tasks.) The algorithms assume that an interconnection network connects all the processors and that the cost of unit communication is the same between all the processors. (A bus is an example of such a network.) Although our algorithms specifically target a network-connected cluster of workstations, the issues are similar for parallelizing such applications on a network of machines. We'll also show how to use or extend these algorithms for an adaptive environment. Mapping graph vertices onto a 1D space can facilitate extremely fast remapping when the environment changes. This simple remapping achieves acceptable partitioning, though poorer than with mapping from scratch.
Maher Kaddoura, Chao-Wei Ou, Sanjay Ranka, "Partitioning Unstructured Computational Graphs for Nonuniform and Adaptive Environments", IEEE Concurrency (out of print), vol.3, no. 3, pp. 63-69, Fall 1995, doi:10.1109/M-PDT.1995.414844