The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - June (2013 vol.24)
pp: 1234-1244
Dong Yuan , Swinburne University of Technology, Melbourne
Yun Yang , Swinburne University of Technology, Melbourne
Xiao Liu , Swinburne University of Technology, Melbourne
Wenhao Li , Swinburne University of Technology, Melbourne
Lizhen Cui , Shandong University, Jinan
Meng Xu , Shandong University, Jinan
Jinjun Chen , University of Technology Sydney, Sydney
ABSTRACT
Massive computation power and storage capacity of cloud computing systems allow scientists to deploy computation and data intensive applications without infrastructure investment, where large application data sets can be stored in the cloud. Based on the pay-as-you-go model, storage strategies and benchmarking approaches have been developed for cost-effectively storing large volume of generated application data sets in the cloud. However, they are either insufficiently cost-effective for the storage or impractical to be used at runtime. In this paper, toward achieving the minimum cost benchmark, we propose a novel highly cost-effective and practical storage strategy that can automatically decide whether a generated data set should be stored or not at runtime in the cloud. The main focus of this strategy is the local-optimization for the tradeoff between computation and storage, while secondarily also taking users' (optional) preferences on storage into consideration. Both theoretical analysis and simulations conducted on general (random) data sets as well as specific real world applications with Amazon's cost model show that the cost-effectiveness of our strategy is close to or even the same as the minimum cost benchmark, and the efficiency is very high for practical runtime utilization in the cloud.
INDEX TERMS
Benchmark testing, Computational modeling, Materials, Runtime, Delay, Algorithm design and analysis, Cloud computing, cloud computing, Data sets storage, computation-storage tradeoff, computation- and data-intensive applications
CITATION
Dong Yuan, Yun Yang, Xiao Liu, Wenhao Li, Lizhen Cui, Meng Xu, Jinjun Chen, "A Highly Practical Approach toward Achieving Minimum Data Sets Storage Cost in the Cloud", IEEE Transactions on Parallel & Distributed Systems, vol.24, no. 6, pp. 1234-1244, June 2013, doi:10.1109/TPDS.2013.20
REFERENCES
[1] "Amazon Cloud Services" http:/aws.amazon.com/, 2013.
[2] I. Adams, D.D.E. Long, E.L. Miller, S. Pasupathy, and M.W. Storer, "Maximizing Efficiency by Trading Storage for Computation," Proc. Workshop Hot Topics in Cloud Computing, 2009.
[3] S. Agarwala, D. Jadav, and L.A. Bathen, "iCostale: Adaptive Cost Optimization for Storage Clouds," Proc. IEEE Int'l Conf. Cloud Computing, pp. 436-443, 2011.
[4] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, "A View of Cloud Computing," Comm. ACM, vol. 53, pp. 50-58, 2010.
[5] R. Bose and J. Frew, "Lineage Retrieval for Scientific Data Processing: A Survey," ACM Computing Surveys, vol. 37, pp. 1-28, 2005.
[6] A. Burton and A. Treloar, "Publish My Data: A Composition of Services from ANDS and ARCS," Pro. IEEE Fifth Int'l Conf. e-Science, pp. 164-170, 2009.
[7] R. Buyya, C.S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, "Cloud Computing and Emerging IT Platforms: Vision, Hype, and Reality for Delivering Computing as the fifth Utility," Future Generation Computer Systems, vol. 25, pp. 599-616, 2009.
[8] R. Campbell, I. Gupta, M. Heath, S.Y. Ko, M. Kozuch, M. Kunze, T. Kwan, K. Lai, H.Y. Lee, M. Lyons, D. Milojicic, D. O'Hallaron, and Y.C. Soh, "Open CirrusTM Cloud Computing Testbed: Federated Data Centers for Open Source Systems and Services Research," Proc. Workshop Hot Topics in Cloud Computing, 2009.
[9] J. Chen and Y. Yang, "Temporal Dependency based Checkpoint Selection for Dynamic Verification of Temporal Constraints in Scientific Workflow Systems," ACM Trans. Software Eng. and Methodology, vol. 20, article 9, 2011.
[10] Y. Cui, H. Wang, and X. Cheng, "Channel Allocation in Wireless Data Center Networks," Proc. IEEE INFOCOM, pp. 1395-1403, 2011.
[11] E. Deelman, D. Gannon, M. Shields, and I. Taylor, "Workflows and e-Science: An Overview of Workflow System Features and Capabilities," Future Generation Computer Systems, vol. 25, pp. 528-540, 2009.
[12] E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good, "The Cost of Doing Science on the Cloud: The Montage Example," Proc. ACM/IEEE Conf. Supercomputing, 2008.
[13] I. Foster, J. Vockler, M. Wilde, and Z. Yong, "Chimera: A Virtual Data System for Representing, Querying, and Automating Data Derivation," Proc. 14th Int'l Conf. Scientific and Statistical Database Management, pp. 37-46, 2002.
[14] I. Foster, Z. Yong, I. Raicu, and S. Lu, "Cloud Computing and Grid Computing 360-Degree Compared," Proc. Grid Computing Environments Workshop, 2008.
[15] S.K. Garg, R. Buyya, and H.J. Siegel, "Time and Cost Trade-Off Management for Scheduling Parallel Applications on Utility Grids," Future Generation Computer Systems, vol. 26, pp. 1344-1355, 2010.
[16] P.K. Gunda, L. Ravindranath, C.A. Thekkath, Y. Yu, and L. Zhuang, "Nectar: Automatic Management of Data and Computation in Datacenters," Proc. Ninth Symp. Operating Systems Design and Implementation, pp. 1-14, 2010.
[17] X. Jia, D. Li, H. Du, and J. Cao, "On Optimal Replication of Data Object at Hierarchical and Transparent Web Proxies," IEEE Trans. Parallel and Distributed Systems, vol. 16, no. 8, pp. 673-685, Aug. 2005.
[18] G. Juve, E. Deelman, K. Vahi, and G. Mehta, "Data Sharing Options for Scientific Workflows on Amazon EC2," Proc. ACM/IEEE Conf. Supercomputing, 2010.
[19] H. Khazaei, J. Misic, and V. Misic, "Performance Analysis of Cloud Computing Centers Using M/G/m/m+r Queueing Systems," IEEE Trans. Parallel and Distributed Systems, vol. 23, no. 5, pp. 936-943, May 2012.
[20] D. Kondo, B. Javadi, P. Malecot, F. Cappello, and D.P. Anderson, "Cost-Benefit Analysis of Cloud Computing versus Desktop Grids," Proc. 23th Int'l Parallel and Distributed Processing Symp., 2009.
[21] X. Liu, D. Yuan, G. Zhang, W. Li, D. Cao, Q. He, J. Chen, and Y. Yang, The Design of Cloud Workflow Systems. Springer, 2012.
[22] B. Ludascher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, and E.A. Lee, "Scientific Workflow Management and the Kepler System," Concurrency and Computation: Practice and Experience, vol. 18, pp. 1039-1065, 2005.
[23] K.-K. Muniswamy-Reddy, P. Macko, and M. Seltzer, "Provenance for the Cloud," Proc. Eighth USENIX Conf. File and Storage Technology, pp. 197-210, 2010.
[24] L.J. Osterweil, L.A. Clarke, A.M. Ellison, R. Podorozhny, A. Wise, E. Boose, and J. Hadley, "Experience in Using A Process Language to Define Scientific Workflow and Generate Data Set Provenance," Proc. 16th ACM SIGSOFT Int'l Symp. Foundations of Software Eng., pp. 319-329, 2008.
[25] A.S. Szalay and J. Gray, "Science in an Exponential World," Nature, vol. 440, pp. 23-24, 2006.
[26] D. Warneke and O. Kao, "Exploiting Dynamic Resource Allocation for Efficient Parallel Data Processing in the Cloud," IEEE Trans. Parallel and Distributed Systems, vol. 22, no. 6, pp. 985-997, June 2011.
[27] L. Young Choon and A.Y. Zomaya, "Energy Conscious Scheduling for Distributed Computing Systems under Different Operating Conditions," IEEE Trans. Parallel and Distributed Systems, vol. 22, no. 8, pp. 1374-1381, Aug. 2011.
[28] D. Yuan, Y. Yang, X. Liu, and J. Chen, "A Cost-Effective Strategy for Intermediate Data Storage in Scientific Cloud Workflows," Proc. 24th Int'l Parallel and Distributed Processing Symp., 2010.
[29] D. Yuan, Y. Yang, X. Liu, and J. Chen, "A Local-Optimisation Based Strategy for Cost-Effective Data Sets Storage of Scientific Applications in the Cloud," Proc. IEEE Fourth Int'l Conf. Cloud Computing, pp. 179-186, 2011.
[30] D. Yuan, Y. Yang, X. Liu, and J. Chen, "On-Demand Minimum Cost Benchmarking for Intermediate Data Sets Storage in Scientific Cloud Workflow Systems," J. Parallel and Distributed Computing, vol. 71, pp. 316-332, 2011.
[31] D. Yuan, Y. Yang, X. Liu, G. Zhang, and J. Chen, "A Data Dependency Based Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems," Concurrency and Computation: Practice and Experience, vol. 24, pp. 956-976, 2012.
[32] M. Zaharia, A. Konwinski, A.D. Joseph, R. Katz, and I. Stoica, "Improving MapReduce Performance in Heterogeneous Environments," Proc. Eighth USENIX Symp. Operating Systems Design and Implementation, pp. 29-42, 2008.
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool