The Community for Technology Leaders
SC Conference (2006)
Tampa, Florida
Nov. 11, 2006 to Nov. 17, 2006
ISBN: 0-7695-2700-0
pp: 29
Daniel Nurmi , University of California, Santa Barbara
Anirban Mandal , Rice University
John Brevik , University of California, Santa Barbara
Chuck Koelbel , Rice University
Rich Wolski , University of California, Santa Barbara
Ken Kennedy , Rice University
Large-scale distributed systems offer computational power at unprecedented levels. In the past, HPC users typically had access to relatively few individual supercomputers and, in general, would assign a one-to-one mapping of applications to machines. Modern HPC users have simultaneous access to a large number of individualmachines and are beginning to make use of all of them for single-application execution cycles. One method that application developers have devised in order to take advantage of such systems is to organize an entire application execution cycle as a workflow. The scheduling of such workflows has been the topic of a great deal of research in the past few years and, although very sophisticated algorithms have been devised, a very specific aspect of these distributed systems, namely that most supercomputing resources employ batch queue scheduling software, has heretofore been omitted from consideration, presumably because it is difficult to model accurately. In this work, we augment an existing workflow scheduler through the introduction of methods which make accurate predictions of both the performance of the application on specific hardware, and the amount of time individual workflow tasks will spend waiting in batch queues. Our results show that although a workflow scheduler alone may choose correct task placement based on data locality or network connectivity, this benefit is often compromised by the fact that most jobs submitted to current systems must wait in overcommited batch queues for a significant portion of time. However, incorporating the enhancements we de- scribe improves workflow execution time in settings where batch queues impose significant delays on constituent workflow tasks.

C. Koelbel, K. Kennedy, J. Brevik, A. Mandal, D. Nurmi and R. Wolski, "Evaluation of a Workflow Scheduler Using Integrated Performance Modelling and Batch Queue Wait Time Prediction," SC Conference(SC), Tampa, Florida, 2006, pp. 29.
94 ms
(Ver 3.3 (11022016))