The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - Fourth Quarter (2012 vol.5)
pp: 512-524
Sangho Yi , INRIA, Grenoble
Artur Andrzejak , Heidelberg University, Heidelberg
Derrick Kondo , INRIA, Grenoble
ABSTRACT
Recently introduced spot instances in the Amazon Elastic Compute Cloud (EC2) offer low resource costs in exchange for reduced reliability; these instances can be revoked abruptly due to price and demand fluctuations. Mechanisms and tools that deal with the cost-reliability tradeoffs under this schema are of great value for users seeking to lessen their costs while maintaining high reliability. We study how mechanisms, namely, checkpointing and migration, can be used to minimize the cost and volatility of resource provisioning. Based on the real price history of EC2 spot instances, we compare several adaptive checkpointing schemes in terms of monetary costs and improvement of job completion times. We evaluate schemes that apply predictive methods for spot prices. Furthermore, we also study how work migration can improve task completion in the midst of failures while maintaining low monetary costs. Trace-based simulations show that our schemes can reduce significantly both monetary costs and task completion times of computation on spot instance.
INDEX TERMS
Checkpointing, Electronic commerce, Pricing, Probability density function, Probability distribution, Cloud computing, volatile resources, Checkpointing, reliability, fault tolerance, cloud computing
CITATION
Sangho Yi, Artur Andrzejak, Derrick Kondo, "Monetary Cost-Aware Checkpointing and Migration on Amazon Cloud Spot Instances", IEEE Transactions on Services Computing, vol.5, no. 4, pp. 512-524, Fourth Quarter 2012, doi:10.1109/TSC.2011.44
REFERENCES
[1] M. Stokely, J. Winget, E. Keyes, C. Grimes, and B. Yolken, "Using a Market Economy to Provision Compute Resources across Planet-Wide Clusters," Proc. Int'l Parallel and Distributed Processing Symp. (IPDPS '09), 2009.
[2] Amazon EC2 Spot Instances, http://aws.amaz- on.com/ec2spot-instances , 2010.
[3] "SpotCloud - Cloud Capacity Clearing House / Spot Market," http:/spotcloud.com, 2012.
[4] Y. Yang and H. Casanova, "UMR: A Multi-Round Algorithm for Scheduling Divisible Workloads," Proc. Int'l Parallel and Distributed Processing Symp. (IPDPS), p. 24, 2003.
[5] A. Andrzejak, D. Kondo, and S. Yi, "Decision Model for Cloud Computing under SLA Constraints," Proc. IEEE Int'l Symp. Modeling, Analysis and Simulation of Computer and Telecomm. Systems (MASCOTS '10), Aug. 2010.
[6] "The Berkeley Open Infrastructure for Network Computing," http:/boinc.berkeley.edu, 2012.
[7] "Amazon Simple Storage Service FAQs," http://a-ws.amazon. com/s3faqs, 2010.
[8] S. Yi, D. Kondo, and A. Andrzejak, "Reducing Costs of Spot Instances via Checkpointing in the Amazon Elastic Compute Cloud," Proc. Third Int'l Conf. Cloud Computing (CLOUD '10), pp. 236-243, July 2010.
[9] S. Yi and D. Kondo, "How Checkpointing Can Reduce Cost of Using Clouds?" Proc. Third EU-Korea Conf. Science and Technology (EKC '10), Aug. 2010.
[10] S. Yi, J. Heo, Y. Cho, and J. Hong, "Taking Point Decision Mechanism for Page-Level Incremental Checkpointing Based on Cost Analysis of Process Execution Time," J. Information Science and Eng., vol. 23, no. 5, pp. 1325-1337, Sept. 2007.
[11] D. Kondo, B. Javadi, P. Malecot, F. Cappello, and D.P. Anderson, "Cost-Benefit Analysis of Cloud Computing versus Desktop Grids," Proc. 18th Int'l Heterogeneity in Computing Workshop, http://mescal.imag.fr/membres/derrick.kondo/ pubskondo_ hcw09.pdf, May 2009.
[12] A. Andrzejak, D. Kondo, and D.P. Anderson, "Exploiting Non-Dedicated Resources for Cloud Computing," Proc. 12th IEEE/IFIP Network Operations & Management Symp. (NOMS '10), Apr. 2010.
[13] M. Palankar, A. Iamnitchi, M. Ripeanu, and S. Garfinkel, "Amazon S3 for Science Grids: A Viable Solution?" Proc. Data-Aware Distributed Computing Workshop (DADC), 2008.
[14] S. Garfinkel, "Commodity Grid Computing with Amazons S3 and EC2," USENIX login, vol. 32, pp. 7-13, 2007.
[15] E. Deelman, S. Gurmeet, M. Livny, J. Good, and B. Berriman, "The Cost of Doing Science in the Cloud: The Montage Example," Proc. ACM/IEEE Conf. Supercomputing (SC '08), 2008.
[16] CloudStatus, http:/www.cloudstatus.com, 2010.
[17] "CloudKick: Simple, Powerful Tools to Manage and Monitor Cloud Servers," https:/www.cloudkick.com, 2010.
[18] "RightScale: Cloud Computing Management Platform," http:/www.rightscale.com, 2010.
[19] J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Proc. Sixth Symp. Operating System Design and Implementation (OSDI), pp. 137-150, 2004.
[20] M. Litzkow, M. Livny, and M. Mutka, "Condor - A Hunter of Idle Workstations," Proc. Eighth Int'l Conf. Distributed Computing Systems (ICDCS '98), 1988.
[21] G. Bosilca, A. Bouteiller, F. Cappello, S. Djilali, G. Fedak, C. Germain, T. Herault, P. Lemarinier, O. Lodygensky, F. Magniette, V. Neri, and A. Selikhov, "MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes," Proc. ACM/IEEE Conf. Supercomputing (SC '02), 2002.
[22] A. Duda, "The Effects of Checkpointing on Program Execution Time," Information Processing Letters, vol. 16, no. 1, pp. 221-229, July 1983.
[23] S. Fu and C.-Z. Xu, "Exploring Event Correlation for Failure Prediction in Coalitions of Clusters," Proc. ACM/IEEE Conf. Supercomputing (SC '07), pp. 1-12, 2007.
[24] B. Javadi, D. Kondo, J. Vincent, and D. Anderson, "Mining for Availability Models in Large-Scale Distributed Systems: A Case Study of SETI@home," Proc. 17th IEEE/ACM Int'l Symp. Modelling, Analysis and Simulation of Computer and Telecomm. Systems (MASCOTS), Sept. 2009.
[25] D. Kondo, A. Andrzejak, and D.P. Anderson, "On Correlated Availability in Internet Distributed Systems," Proc. IEEE/ACM Int'l Conf. Grid Computing (Grid), 2008.
[26] A. Andrzejak, P. Domingues, and L.M. Silva, "Predicting Machine Availabilities in Desktop Pools," Proc. 10th IEEE/IFIP Network Operations & Management Symp. (NOMS '06), pp. 1-4, Apr. 2006.
[27] J.S. Plank, K. Li, and M.A. Puening, "Diskless Checkpointing," IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 10, pp. 972-986, Oct. 1998.
[28] S. Yi, J. Heo, Y. Cho, and J. Hong, "Adaptive Page-Level Incremental Checkpointing Based on Expected Recovery Time," Proc. ACM Symp. Applied Computing (SAC '06), pp. 1472-1476, Apr. 2006.
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool