The Community for Technology Leaders
High-Performance Distributed Computing, International Symposium on (2002)
Edinburgh, Scotland
July 24, 2002 to July 26, 2002
ISSN: 1082-8907
ISBN: 0-7695-1686-6
pp: 352
Kavitha Ranganathan , University of Chicago
Ian Foster , University of Chicago and Argonne National Laboratory
<p>In high energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. So-called Data Grids seek to harness geographically distributed resources for such large-scale data-intensive problems. Yet effective scheduling in such environments is challenging, due to a need to address a variety of metrics and constraints (e.g., resource utilization, response time, global and local allocation policies) while dealing with multiple, potentially independent sources of jobs and a large number of storage, compute, and network resources.</p> <p>We describe a scheduling framework that addresses these problems. Within this framework, data movement operations may be either tightly bound to job scheduling decisions or, alternatively, performed by a decoupled, asynchronous process on the basis of observed data access patterns and load. We develop a family of job scheduling and data movement (replication) algorithms and use simulation studies to evaluate various combinations. Our results suggest that while it is necessary to consider the impact of replication on the scheduling strategy, it is not always necessary to couple data movement and computation scheduling. Instead, these two activities can be addressed separately, thus significantly simplifying the design and implementation of the overall Data Grid system.</p>
Kavitha Ranganathan, Ian Foster, "Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications", High-Performance Distributed Computing, International Symposium on, vol. 00, no. , pp. 352, 2002, doi:10.1109/HPDC.2002.1029935
91 ms
(Ver 3.3 (11022016))