In this paper, we consider task-level scheduling algorithms with respect to budget and deadline constraints for a batch of MapReduce jobs on a set of provisioned heterogeneous (virtual) machines in cloud platforms. The heterogeneity is manifested in the popular ”pay-as-you-go” charging model where the service machines with different performance would have different service rates. We organize the batch of jobs as a -stage workflow and study two related optimization problems, depending on whether the constraints are on monetary budget or on scheduling length of the workflow. First, given a total monetary budget B, by combining an in-stage local greedy algorithm (whose optimality is also proven) and dynamic programming (DP) techniques, we propose a global optimal scheduling algorithm to achieve minimum scheduling length of the workflow within O(B2). Although the optimal algorithm is efficient when B is polynomially bounded by the number of tasks in the MapReduce jobs, the quadratic time complexity is still high. To improve the efficiency, we further develop two greedy algorithms, called Global Greedy Budget (GGB) and Gradual Refinement (GR), each adopting different greedy strategies. In GGB we extend the idea of the local greedy algorithm to the efficient global distribution of the budget with minimum scheduling length as a goal whilst in GR we iteratively apply the DP algorithm to the distribution of exponentially reduced budget so that the solutions are gradually refined. Second, we consider the optimization problem of minimizing cost when the (time) deadline of the computation D is fixed. We convert this problem into the standard Multiple-Choice Knapsack Problem via a parallel transformation. Our empirical studies verify the proposed optimal algorithms and show the efficiencies of the greedy algorithms in cost-effectiveness to distribute the budget for performance optimizations of the MapReduce workflows.
Wei Shi, "Budget-Driven Scheduling Algorithms for Batches of MapReduce Jobs in Heterogeneous Clouds", IEEE Transactions on Cloud Computing, , no. 1, pp. 1, PrePrints PrePrints, doi:10.1109/TCC.2014.2316812